Sunday, 6 October 2013

Finding Duplicate Videos


Well its been a while since my last post mainly due to work and when I wasn't working I was doing my hobbies which involved traveling a lot, so past while Ive been going places rock climbing. But I'm not going to go into too much detail about that.

So anyway in this post I wanted to talk about how to find duplicate videos files, now I have written a while ago a program that can do this with a high accuracy. I wanted to talk about the different approaches I've tried and how to go about doing those approaches.

The hard part is deciding what information about the video you are going to use to compare against other videos, the simplest is using metadata such as the name, length and resolution however this is far from optimal and many videos can go unmatched.

The better option would be to match the visual content or perceptual information from a video to other videos, or some other technique such as key frame intervals, match audio tracks.

The method I chose was to extract the RGB values for a frame and calculate its kurtosis, variance and average and then build a timeline(and array of values) of how each of these values changes per frame per colour channel. I have created a earlier post that details how to do this here (its pretty much doing that image processing and storing the results). After the timeline was created I would perform a kurtosis and variance on the timeline itself and then loosely match videos that had a similar resulting values. 

Note that is wasn't necessary to process the whole video but only about 2min of footage to build a fairly unique timeline

Using this method I was capable of correctly matching videos that had a offset, encoding artifacts, different frame rates, different resolutions combinations of all four.