We normally try to keep the more technical content off of our blog and on our YouTube channel. However, after some back and forth emails on the Forensic Video List Serv, I’ve decided to take a bit of a deeper dive than normal in this post.
Historically in the video analysis world, a proprietary file had to be played inside of a proprietary player. This was because in the early days of proprietary players, tools like iNPUT-ACE didn’t exist. One of the common assumptions in those days was that, if a proprietary player is playing the file, then it must be playing it accurately. Unfortunately, this assumption is NOT correct. There are many scenarios where a proprietary player misreads the video stream. Let’s dive into some of the common issues we can encounter when using a proprietary player:
Examples of Issues with Proprietary Players
Clipped Color Values During Default Playback
In the example below, you’ll see one proprietary player’s representation of the video on the left and the raw data on the right. Notice how much brighter (and therefore washed out) the data is on the left. The player stripped a lot of detail in this case. Imagine trying to do a license plate enhancement where the limited detail on the license plate is stripped out by the player right from the start.
Arbitrary Rows of Pixels “Duplicated”
This is a little harder to see at full scale, so a cropped/resize version is included below as well. What you will notice is that the Clip Player (on the left) has arbitrarily duplicated two rows of pixels horizontally across the image in two places. Those areas should be one line of pixels, but it becomes two – which distorts all the objects that move through the area. The Clip Player would not be appropriate for saying anything about an object’s shape – and furthermore suspect height calculations will be inaccurate, distances traveled will be inaccurate, etc.
This is harder to show in a still image, so I recommend reviewing the YouTube video for clarification. When proprietary players perform this process, the results can be catastrophic. Our team has encountered at least 4 different players that all automatically deinterlace the video footage by DELETING the lower field. The player quite literally dumps half the evidence in this process. If clips are produced via the proprietary player in these situations, it could very easily result in some challenging questions posed by opposing counsel. Imagine defense counsel asking an investigator who produced the clip, “Why did you delete this frame of video in your work product? How about this one, why did you delete this evidence? … Is it accurate to say that you deleted half of the evidence in this case?”
Frames Missed During Playback
There is a relatively high-profile case right now where an officer is on trial for manslaughter after a use-of-force investigation. The investigator who initially reviewed the video of the incident used the proprietary player to create a clip of the events in question, but the player did not display all of the frames during his screen-capture. Without these frames, the resulting clip had an accelerated appearance of force – where it artificially made it look like the officer “slammed” the individual’s head into the concrete. The incorrectly produced clip was ultimately being used against the officer at the manslaughter trial. A summary of those events, including the video used, can be found here: Accurate Video Tells Another Story in Use of Force Case.
There are so many more examples of proprietary players doing nasty things like this to the video. The obvious question is “WHY!?!” – but it’s important to realize that these programs (and DVRs for that matter) are not being created for the video analyst. They’re being created for the consumer who wants to buy a $200 DVR system that includes 16 cameras and a 1TB hard drive. To get there, manufacturers cut corners. This means using really cheap hardware, borrowing (or in some cases I suspect “stealing” is a better word) code from other players/manufacturers that wasn’t intended to be used the way it’s being used now, failing to update code after modifying hardware/codecs/etc., and otherwise making a ton of assumptions about the data. As one example of this, the “Clip Player” program (like the screenshot above) is used by a number of different manufacturers. The program looks and functions the same in each instance, but I’ve seen a TON of different video formats living under the player. All of those formats are being funneled into the same player with the same mistakes by different manufacturers…. And I suspect no one will ever fix that code.
Years ago, video analysts who wanted to avoid these limitations started learning how to use FFmpeg to decode proprietary video files. FFmpeg is a very useful tool, but it can lead to other issues that we will talk about next.
FFmpeg Issues and Examples
For the uninitiated – FFmpeg is a powerful open-source multimedia framework. It powers some fundamental decoding/encoding work that is done within iNPUT-ACE (and practically every other video tool on the market today). Although FFmpeg is powerful, it has its own limitations. In fact, in some cases, it will even DROP frames of video from proprietary video formats… And here is where the blog will take a slightly more technical turn.
FFmpeg’s power (as it relates to the proprietary video world) comes from its ability to parse a “bitstream” of video data without needing a standard container. Most other playback tools (like VLC, Windows Media Player, etc.) all require a standard container in order to decode video images, but FFmpeg can get there by parsing out the raw stream of bits. FFmpeg can perform this magic for a number of different codecs, but for the purpose of this blog, we’re going to focus on the most common type we see: Annex B H.264.
Annex B is a very specific way of defining H.264 video data. To understand the next section of this blog, consider the analogy of Annex B being one “language” for video data.
You are reading this blog right now in English because you have the English language decoder in your mind. When you read this blog, you are reading individual letters in linear order (from left to right across the page), and combinations of letters are making up words, and combinations of words are making up sentences. All of which you are decoding in order to understand the meaning of the text.
Annex B encoding/decoding is very similar in this regard. A DVR recording to Annex B will “write” the file in linear order using binary information (1s and 0s). When FFmpeg reads the binary data, it’s using the Annex B “language” to decode through the patterns of 1s and 0s which it understands to be picture values based on the patterns of 1s and 0s it sees.
Without going down much deeper into the technical rabbit hole here, one of the VERY common patterns Annex B (and thereby FFmpeg) uses to identify frames of video is 0x 00 00 01.
Every time this pattern is “read” by FFmpeg, it assumes that a frame is about to start.
Here’s an example of the beginning of an I-frame in an Annex B file (the screenshots are from the HEX reader included with iNPUT-ACE). Note that this is a slight over-simplification:
When FFmpeg reads the above sequence of data, it knows to expect that the data coming after it includes information about how to display the frame (including some additional metadata).
This process would work perfectly if all video files were made equal, but unfortunately, that is not the case. Manufacturers of surveillance camera systems generally weave proprietary metadata into the bitstream in a way that is unique to them (including things like date/time, GPS, etc.). FFmpeg does a great job of simply ignoring this content in most cases – but what happens if the content happens to line up with one of the patterns that FFmpeg understands to be video?
In the following case example, the DVR system writes some padding of 00 bytes between frames. In other words, every frame of video has some 0x 00 00 00 00 data that separates one frame from another. If you were to read through the binary data in the file manually you would see Frame 1 -> 00 padding -> Proprietary metadata for Frame 2 -> Frame 2. Here’s an example:
If this clip were decoded with FFmpeg, everything would work very well. FFmpeg will try to decode the entire thing using H.264 -but when it gets to the 00 padding and the proprietary metadata, it will ignore it. From a user’s perspective, Frame 1 will continue into Frame 2 without any issues (other than potentially some logged errors – but nothing out of the ordinary).
However, in the following case, things will not be so simple. This example is taken from a real civil case where the file was being used to calculate the speed of a vehicle involved in a fatal collision. A link to this file is included at the end of the blog so you can validate your tools and see the problems directly in FFmpeg. Here’s the example of the binary data:
Do you notice anything concerning with these values? Take a look at the highlighted area here:
After the padding of 00 between the two frames, the proprietary metadata happens to start with “01”. This wouldn’t be an issue for the proprietary player that understands the purpose of that metadata, but for FFmpeg, all it sees is: 00 00 01. Which if you recall from earlier, is a standard Annex B pattern for a video frame.
What happens here is FFmpeg reads that data, interprets it as a start of a frame, and incorrectly assumes that everything following it is frame data. Since the data isn’t actually frame data (since it’s still just proprietary metadata), FFmpeg chokes and freezes up. This causes a significant number of frames to be dropped until the next I-frame appears in the bit stream. Unless a user knows where to look, they would likely miss the dropped frames, as the expert who opposed us did in this particular civil case.
Let’s take a look at the actual clips in question at this section of the video.
Here is the clip being read by FFmpeg with dropped frames:
Here is the clip shown by iNPUT-ACE playing all of the frames:
Note: the file in iNPUT-ACE reports all of the frames. There should be 107,857 playable frames in the file.
So, where does that leave us?
The iNPUT-ACE team has built – and continues to maintain – a purpose-built software solution for playback, conversion, and analysis of proprietary video files. Now users can drag/drop/play over 92% of the surveillance videos they encounter in a day-to-day workflow. Plus, the program goes far beyond simple playback. It is always important to validate your tools, and the iNPUT-ACE team puts a tremendous amount of work into ensuring that raw video data is instantly accessible, that proprietary metadata (like timestamps) are decoded, and that tools are included to streamline your video investigation.
This blog post got a lot longer than intended and diverged into some technical areas, so if you have any comments or questions, please reach out to our team at firstname.lastname@example.org.
See the dropped frames in FFmpeg for yourself and validate your tools using the link below.
The file should have 107,857 playable frames when played correctly.