While taking a course on Discrete Time Signals and Systems at college, I completed the following project as a part of the curriculum. In this project an image is taken and a template is cropped out of it. Then correlation is performed using Fourier transform and the locations where the templates match has a higher pixel value. Then we define a threshold which is used to separate these high pixel value locations from the rest of the image and thus we get locations of the template on the image.
This technique of character recognition has some drawbacks though. The most prominent of them being, that the template image must be a near match to the objects in the image. Thus when parsing documents for handwritten text, this technique does not give the required results (I have tried this, and it failed to give a reliable result). The second drawback being that the threshold although can be taken as a value just lower than the maximum pixel value. But this usually has to be set by the programmer as the automated procedure doesn’t always gives the desired result. The problem also multiplies when you move from black and white images to color images.
This project was designed in MATLAB R2011b. I have attached the codes and results of the same at the end of this post.
The Fourier transform can also be used to perform correlation. Following are the steps while matching a template to the image using the above said procedure.
- Read in the sample image.
- Create a template for matching by extracting it from the image. You can also create the template image by using the interactive version of imcrop.
- Compute the correlation of the template image with the original image by rotating the template image by 180oand then using the FFT-based convolution technique. To match the template to the image, use the fft2and ifft2 functions. This technique was described on a forum on MathWorks and greatly helped in understanding the functions.
C = real(ifft2(fft2(bw) .* fft2(rot90(a,2),256,256)));
- To view the locations of the template in the image, find the maximum pixel value and then define a threshold value that is less than this maximum. The locations of these peaks are indicated by the white spots in the thresholded correlation image.