CodeBlocks

Showing posts with label Coding. Show all posts
Showing posts with label Coding. Show all posts

Sunday, 6 October 2013

Finding Duplicate Videos

Absence

Well its been a while since my last post mainly due to work and when I wasn't working I was doing my hobbies which involved traveling a lot, so past while Ive been going places rock climbing. But I'm not going to go into too much detail about that.





So anyway in this post I wanted to talk about how to find duplicate videos files, now I have written a while ago a program that can do this with a high accuracy. I wanted to talk about the different approaches I've tried and how to go about doing those approaches.

The hard part is deciding what information about the video you are going to use to compare against other videos, the simplest is using metadata such as the name, length and resolution however this is far from optimal and many videos can go unmatched.

The better option would be to match the visual content or perceptual information from a video to other videos, or some other technique such as key frame intervals, match audio tracks.

The method I chose was to extract the RGB values for a frame and calculate its kurtosis, variance and average and then build a timeline(and array of values) of how each of these values changes per frame per colour channel. I have created a earlier post that details how to do this here (its pretty much doing that image processing and storing the results). After the timeline was created I would perform a kurtosis and variance on the timeline itself and then loosely match videos that had a similar resulting values. 

Note that is wasn't necessary to process the whole video but only about 2min of footage to build a fairly unique timeline

Using this method I was capable of correctly matching videos that had a offset, encoding artifacts, different frame rates, different resolutions combinations of all four.

Sunday, 7 July 2013

JQuery Mobile Tips

So over the past month I was developing a mobile webapp using JQuery Mobile and Grails and during that time the amount of silly glitches and bugs I encountered was relatively small but SO GOD DAMN annoying. So im going to post some of the common once I encounteted and how to solve them.

First off if you're experiencing JQueryMobile performance issues add the following CSS settings.

/*JQuery Mobile Optimisation */
.ui-shadow,.ui-btn-up-a,.ui-btn-hover-a,.ui-btn-down-a,
.ui-body-b,.ui-btn-up-b,.ui-btn-hover-b,.ui-btn-down-b,
.ui-bar-c,.ui-body-c,.ui-btn-up-c,.ui-btn-hover-c,
.ui-btn-down-c,.ui-bar-c,.ui-body-d,.ui-btn-up-d,
.ui-btn-hover-d,.ui-btn-down-d,.ui-bar-d,
.ui-body-e,.ui-btn-up-e,.ui-btn-hover-e,
.ui-btn-down-e,.ui-bar-e,.ui-overlay-shadow,
.ui-shadow,.ui-btn-active,.ui-body-a,.ui-bar-a 
{
    text-shadow: none;
    box-shadow: none;
    -webkit-box-shadow: none;
}

/*Enable CSS hardware acceleration*/ 
.hwaccel {  -webkit-transform: translateZ(0); } 


The visual impact is quite small but the performance gained is about 100% If you're having strange visual problems when linking from one page to another such as flashes, blinking where the screens flickers first then displays the page, the problem is with JQueryMobile transitions, there are 3ways to fix this. Set the transition in the to none this doesnt always want to work and its to much work to maintain. 2nd you can download a plugin that fixes the transitions but doesn't work with the new JQueryMobile 1.3.1. Or do this:

This will disable transitions for all links in a page. If you want to link to a page without using AJAX is sometimes it causes weird issues with redirects and linking add this to a tag data-rel="external" data-ajax="false" If you're mobile app support rotation(Which IT SHOULD!) and you have JQueryMobile Headers and Footers that don't always resize on rotation add the following Javascript:

If you're developing for Apple iOS as well then you might encounter a small zooming glitch where the device will automatically zoom in when selecting form items, to fix this disable zooming.

And there you go that will fix most of the annoying issues you might encounter with JQueryMobile. I'll update this when I encounter more bugs and fixes.

Sunday, 31 March 2013

Identifying Duplicate Images

Identifying similar images and determining which are identical. 


So in this post I want to talk about how you can write a program that will be able to compare images against each other and determine if they're identical or very similar. Basicly the same thing tineye and google's reverse image search does. So if you've ever wondered how they work then this post might clear things up a bit. In a follow up post I also want to describe how we can apply a similar technique to identify duplicate videos.

The Trick?
There is no, one solution to the problem and every solution has its advantages and disadvantages,  the approach im going to be using is based off of the histogram of the images. A very simple yet very effective method, the draw back of it being false positives and it can only match images that are visually very similar. But it can identify images with varing qualities of:
  • Resolution
  • Rotation
  • Filesize
  • format
  • compression
  • quality

What we are going to do is, extract the distribution of colours that make up the image and plot the frequecy of those colours. Also known as a histogram, in short a histogram will show us how much of each colour value(0-255) in each colour range(RGB) is used in the image.

The histogram represents the frequency of each colour value.

Two images that are visually identical will have a very similar histogram(98%) depending resolution and other factors. However images can have they same histogram yet not be visually similar, but at least we are certain to find duplicates, and these false positive detection is a problem we can reduce.





As the figures above show that two identical looking images with different properties in both size and resolution share a near identical histogram.

The Algorithm

The algorithm in broken up into 3 main sections. 
  1. Finding the images
  2. Analysis
    1. Load Images
    2. Extract Colour Map
    3. Generate Histogram
    4. Normalise Histogram
  3. Comparison

Step 1 - Finding the images

We can hardcode the locations of the images into the program, this is useful when just testing the analysis of the algorithm, but for practical purposes we want the program to be able find those images for us. To do this we will use another algorithm to recursively load each directory and subdirectory and find all the images, and add those to a list.

Select a Directory
public void LoadDirectory()
{  
 JFileChooser fileChooser = new JFileChooser();
 int result = fileChooser.showOpenDialog(null);
 
 if(result==JFileChooser.APPROVE_OPTION)
 {
  URL fileURL = null;
  try
  {
   fileURL = fileChooser.getSelectedFile().toURL();
   
   String _folderPath = fileURL.toString().substring(6);
   String[] folderPath = _folderPath.split("/");
   
   for(int i=0; i<folderPath.length-1; i++)
   {
    directory = directory+folderPath[i]+"/";
   } 
   
  }
  catch(Exception e)
  {}
 } 
 File folder = new File(directory);
 LoadSubDirectories(folder);
}
So the above snippet will allow us to pick a folder to start our search, and the last method called LoadSubDirectories(folder) will then load all the subdirectories in that folder 
Select Directory view



Load Subdirectories
public static int spc_count=-1;
private void LoadSubDirectories(File folder)
{    
    spc_count++;
  
    String spcs = "";
     
     for (int i = 0; i &lt spc_count; i++)
       spcs += " ";
     
     if(folder.isFile())
     {  
   String[] temp3 = folder.getName().split("\\.");
   
   if(
     temp3[1].equalsIgnoreCase("jpg")||
     temp3[1].equalsIgnoreCase("jpeg")
    )
   {                 
//Adds file location to ImageObject(my own class) you can just add
//absoluteFile parameter to an ArrayList
            ImageObject iObject = new ImageObject(folder.getAbsoluteFile());
   }   
     }
     else if (folder.isDirectory()) 
     {
       File[] listOfFiles = folder.listFiles();
       
       if(listOfFiles!=null) 
       {
  for (int i = 0; i &lt listOfFiles.length; i++)
   LoadSubDirectories(listOfFiles[i]);
       } 
     }
     
     spc_count--;

     if(spc_count==-1)
     {
  //Add logic of what to do when done searching 
     }   
}
This code will then load all the subdirectories and find all images ending with a jpg, jpeg and add those to a list, or in my case an ImageObject(own class), which I also use to store their features(histograms).

Step 2 - Analysis

In this step we are going to use the absolute addresses we found in step one to load those images and perform analysis on them. To perform analysis on the images we are going to have to do a few things
1. Load the image into the program
2. Extract the colour map
3. Generate Histogram
4. Normalise Histogram

2.1 - Loading the image

for(ImageObject listIfImages : iObject)

BufferedImage image = null;
try
{
//now this does use my class(ImageObject) to store and 
//load the images But all you have to do is replace
//iObject.toString() with the absolute path.toString()
//of the image 

image = ImageIO.read(new File(iObject.toString()));  

        Imager.process(image);//my class that processes the image

//Just storing the histograms into the image object, 
//so that I can save and load it to disk
//easier so that I don't have to process an 
//image more than once.
iObject.redHisto = Imager.getRedHisto();
iObject.greenHisto = Imager.getGreenHisto();
iObject.blueHisto = Imager.getBlueHisto();
}
catch(Exception e)
{}
So in my ImageProcessor Class I will have an a method that takes a buffered image. 2 - Extract the Colour map

2.2 - Extract Colour Map

public final void process(BufferedImage _image)
{   
 width = _image.getWidth(this);
 height = _image.getHeight(this);
 
 final int[] rawPixels = new int[width * height]; 
 PixelGrabber grabber = new PixelGrabber
 (
  _image,
  0,
  0,
  width,
  height,
  rawPixels,
  0,
  width
 );
  
 try
 {  
  grabber.grabPixels();
 }
 catch(Exception e)
 {}
 
 final int[][] pixels = new int[width][height];
 
 alpha = new int[width][height];  
 red = new int[width][height];
 green = new int[width][height];
 blue = new int[width][height];
   
int offset = 0;

//This double for loop is where the colour map gets made
//This colour map will be used to generate the histogram
//Note that is section can take the longest and might cause
//Java virtual machine to crash due too large images if this 
//happens do the following select the project:
//properties->Run/Debug Settings-> {class of main method}  
//(x)=Arguments->VM arguments->-Xmx1024m
for (int i = 0, n=width; i < n; i++) 
{
 for (int j = 0, m=height; j < m; j++) 
 { 
     pixels[i][j] = rawPixels[offset];
     // extract the 4 fields
     alpha[i][j] = (pixels[i][j] >> 24) & 0xff;
     red[i][j]   = (pixels[i][j] >> 16) & 0xff;
     green[i][j] = (pixels[i][j]  >>  8) & 0xff;
     blue[i][j]  = pixels[i][j]  & 0xff;
     
     offset++;
 }
}

redHisto = this.createHistogram(red);
greenHisto = this.createHistogram(green);
blueHisto = this.createHistogram(blue);
alphaHisto = this.createHistogram(alpha);

return feature;
}
In case you missed the comment in the middle if your JVM bombs out due to a memory error do the following: Right click on the project and select properties. Click Run/Debug Settings and select the class where your main method is located and click edit, select the (x)=Arguments tab and enter in the VM arguments text area: -Xmx1024


Solving JVM memory problem

2.3 - Generate Histogram

To actually generate a histogram from the colourmap is ridiculously simple
/**@_colorMap created an array of type int that represent 
 *the histogram of that colour
 * @histogram the array returned
**/
private final int[] createHistogram(int[][] _colorMap)
{
 int[] histogram = new int[256];
 
 for(int i=0, n=width; i<n; i++)
 {
  for(int j=0, m=height; j<m; j++)
  {
   histogram[_colorMap[i][j]]++;
  }
 }  
 return histogram;
}

So what this code snippet above does is create a histogram from the colormap that is parsed through. A new array of type int and size of 256(0-255) is created, each array index will then be the corresponding colour value. Such that all instances of red-145 will be stored in histogram[145]. So the trick is that we access the the Array with the value of the _colormap[i][j] and just increment that index.

We then simple return that array, which now represents that colour's histogram.

2.4 - Normalise Histogram

Since we want to compare images of all sizes you have to realise that larger versions of images will have larger value in the histogram array, but relatively speaking those histograms will be near identical. This we need normalise the histogram so that ALL histograms will have the same upper limit. Now you can scale the histogram right after you've generated it and just before you returned it, or you can do it during comparison.

I did mine during comparison so that I could save a raw copy of the histogram since during normalisation certain data is lost. Of course this is slightly slower since I would have to regenerate the histogram each time.

I'm no mathematician so there probably exists a more elegant solution than mine. But in short we going to find the largest value in the histogram and see how close it is to our limit, then divide or multiple all the values by the difference depending on if its larger or smaller;

private int[] normaliseHistogram(int[] _histo)
{
 double scaler=0.0;
 int largestValue=0;
 
 //Normalise all values relative to 255
 int limit = 255;
 
 //Finding Largest Value
 for(int i=0; i<_histo.length; i++)
 {
  if(_histo[i] > largestValue)
   largestValue = _histo.length;
 }
 
 //Calculate the scaler value
 scaler = largestValue/limit;
 
 for(int i=0; i< _histo.length; i++)
 {
  //If scaler is larger than 1 we devide all value by it
  if(scaler > 1)
  {
   _histo[i] =  (int) (_histo[i]/scaler);
  }
  else //if histo smaller than 1 we multiple all the values
  {
   _histo[i] = (int)(_histo[i]*scaler);
  }
 }
 
 return _histo;
}

Step 3 - Comparison

And only one thing left to do and that compare the histograms with each other. So to compare to histograms with each other.
public double compareHistogram(ImageObject _image1, ImageObject _image2, int _color)
{
 int tempHistImage1 = _image1.getHisto(_color);
 int tempHistImage2 = _image2.getHisto(_color);
 
 for(int i=0; i<256; i++)
 {
  double diff = 0;

  TotalIntervals++;
  
  if(temp1>=temp2)
  {
   diff += temp2/temp1*100;    
  }
  else 
  {
   diff += temp1/temp2*100; 
  }
 }
 return diff;
}

Note that this method takes in two ImageObjects again this is my own class I use to represent the data of each image such as there histograms, all I do with these objects is extract there histograms so I can use them. In this code snippet I compare the red histograms with each other, this code will have to be done for both green and blue.
Now all that’s left is to establish the threshold and see if it passes it, if it does then the images are a match.
histogramThreshold = 80.0;

if(compareHistogram(image1, image2, 1)>histogramThreshold)
 return true;


My final program

The image above is of my program that uses the same algorithm as above with some modification, including faster matching and reduced false positives. As well has the ability to find duplicate videos.

Sunday, 17 March 2013

Java simple threading library

Simple threading in Java using JavaCP library

In this post I'm going to talk about a simple way to achieve multi threading and parallel computation using a simple library I've assembled based off of the work of Weimin Xiao and the rest at this stack overflow post.

I take no credit in any of the code written in the library it self(the code below is mine) I merely just assembled a simple library and have used it to successfully achieve embarrassingly thread execution. If you're not familiar with embarrassing threading, it is threading tasks that in no way affect the outcome of each other. E.g having to process a large library of files, this task can be easily threaded by having each task processed in a separate thread. Which is primarily what I've used it for in processing Photos and videos to find perceptually similar content. I'll discuss video and photo matching algorithm in a later post.

1. Where to get the library? here,
2. How to use the library?
  1. Place the JavaCP.jar anywhere you want we you can find it.
  2. In your IDE(eclipse etc..) add the JavaCP.jar as a library
    • In ecplise this is done by right clicking on the project root folder in the package explorer window and selecting properties
    • In the properties window select Java Build Path
    • Select the Libraries tab
    • Click on Add External JARs, navigate to where you placed the JavaCP.jar and select ok.

The library is now ready to be used.
The library is able to thread 2 different things. ForEach Loops and concurrent method execution.

In your program import the library

//Import the library
import JavaCP.*;

Example code to run Tasks
//Using it to execute Multiple methods concurrently
//The number of threads created depends on the number of cores available
Parallel.Tasks(new Task []
{
 
 //task-1
 new Task() {public void run()
 {
  loops.runLoop(1000000000);
 }},
   
 //task-2
 new Task() {public void run()
 {
  loops.runLoop(1000000000);
 }},
 
 new Task() {public void run()
 {
  loops.runLoop(1000000000);
 }},
 
 new Task() {public void run()
 {
  loops.runLoop(1000000000);
 }}
});
In this example the method loops.runLoop(int _value) will be executed multiple times along side other instance of the loops.runLoop(int _value).

A more practical use would be something like this, where each method as to do something time consuming since creating threads themselves are taxing.
Parallel.Tasks(new Task []
{
 //task-1
 new Task() {public void run()
 {
  feature[0] = Variance(red,0);   
 }},
   
 //task-2
 new Task() {public void run()
 {
  feature[1] = Kurtosis(red,0);
 }},   
 //task-3
 new Task() {public void run()
 {
  feature[2] = Variance(green,1);
 }}, 

 //task-4
 new Task() {public void run()
 {
  feature[3] = Kurtosis(green,1);
 }}, 
 //task-5 
 new Task() {public void run()
 {
  feature[4] = Variance(blue,2);
 }},
 //task-6
 new Task() {public void run()
 {
  feature[5] = Kurtosis(blue,2);
 }} 
});
The local array feature[] stores the result of the method execution. So what about ForEach loops? Well they will look a bit different following the next type of structure
Parallel.ForEach(arrayListName, new LoopBody()
{
 public void run(String _objectNameofItemInArray) 
 {
          //code of what must happen to the object
  objectNameofItemInArray.invokeMethod();
 }
}
such that it could look like this example
ArrayList listOfFiles = new ArrayList()

Parallel.ForEach(listOfFiles, new LoopBody()
{

 public void run(ImageObject p) 
 {
  p.process();
 }
}

//In ImageObject class there is a method called process()

public class ImageObject
{
  public ImageObject(){} //Default constructor

  public void process()
  { 
    //ToDo add method logic
  }
}
In this example listOfFiles is an ArrayList of type ImageObject which has a method called process. Threads will be created and multiple ImageObject will then be processed concurrently. There you go a simple way to achieve a basic level of multi threading. Both code snippets are from a project I created to identify duplicate videos and photos.