In 1907, Russian chemist Sergey Prokudin-Gorsky traveled the Russian Empire taking the worlds first color images. He used a black and white camera and took sets of 3 images, each through a blue, green, and red colored film. The results are the Prokudin-Gorskii Collection. He thought that these images would be reconstructed using a special projector to overlay the images on top of each other, but this plan didn't happen because he fled Russia during the Russian Revolution.
The goal for this project is to take digitized versions of his collection from the Library of Congress and reconstruct the color images by automatically cropping and aligning each image in the collection. The output will be 3 channel color images showing the original subject in all its color glory.
To align the images, two problems must be solved. First, the borders around the images should be trimmed because they can affect the quality of the output. This is done with an automatic algorithm. Then the image is cut into thirds and the red and green channels are aligned to the blue.
The input dataset contains some low res (jpg) and some high res (tif) images. For a few of the algorithm features, different parameters were used for each set, and those will be explicitly called out.
Each input image has a white border and then a black border that should be cropped in order to provide better output. One observation about the input images is that if you take a given column of pixels, if the majority are all white or all black, they are most likely a border because the content of the images is rarely all black or white. These borders are detected by starting at each edge of the image (left, right, top, bottom), and iterating inward one row or column at a time. If a row or column is mostly all white or mostly all black, they will be cropped.
The issue is that these borders are not always pure white (r: 255, r: 255, r: 255) or pure black (r: 0, g: 0, b: 0). This can be due to quantization error (noise) or aliasing if the image isn't perfectly square to the camera, some columns may have a partial white border and black border (See Figure 2 for examples). Two functions were implemented to help with this problem.
Some thresholding is applied to help segment the borders. Any pixel with a value below 75 is turned completely black, and any pixel value above 175 is turned completely white. This makes it easier to pick out the borders (See Figure 3).
Threshold Values black = [0, 75) white = (175, 255]
There are some cases where a column would not be detected as a border while searching. A look ahead method was implemented to solve this. The way this works is that when a column is detect as not a border, the algorithm continues searching for a border for a few extra columns. If a border is detected during this look ahead period, the search continues at that point. If a border is not detected, then the original stopping point is used. Different look ahead values were used for the low and high res images.
Look Ahead Values jpg = 2 tif = 6
Now that we have the borders cropped, we cut the image into thirds to get the final blue, green and red channel images. See Figure 5 for an example of these cropped images. Notice that the inner borders are visible in some channels, but that doesn't seem to have as much of an impact on the algorithm performance as having outer borders present.
The content in each of these images are correlated by are hard to match with an alignment because there are different intensities in each channel. One observation is that the borders between objects in the image stay the same despite the objects having different intensities in each image. To utilize this observation, each channel is processed with a Canny Edge Detector, which draws lines at the edges between shapes in the image.
The red and green images plus some offset are each compared to the blue image. A simple L2-Norm proved to be really effective on the Canny processed images, and was quite fast, so that was used to determine when two images were aligned. A window of possible offsets was used to compare the images in the range of [-15, 15] along each axis of the image.
For the larger images, it is computational quite expensive to search over the [-15, 15] window. To optimize this process a pyramid of different image resolution is computed. Each layer of the pyramid is 1/2 the resolution of the prior, and a pyramid is computed to a depth of 5 layers. This allows is to search over a much smaller window of [-5, 5], which is 200 fewer checks that need to be made (or about 1/10th the amount of computation, per layer).
Search window jpg = [-15, 15] tif = [-5, 5]
Each channel is then stacked with the various offsets and presented as a complete color image. Here are the result images for the input dataset, as well as the offsets computed for each channel in the format [x, y].
As a bonus here, each image can be clicked on to view the unaligned version. It's fun to be able to toggle back and forth to see the improvement!
In addition to the Automatic Cropping and Canny Edge Based Alignment as described earlier, here are a few other extra items that were implemented.
There are many images in the collection, so here are a few other examples from the collection aligned with the same algorithm.
One observation is that alignment never looks great on images with people that are standing. One hypothesis is that this is because some of the people are moving in between pictures. It should be possible to build a small interface to toggle between each image image after alignment to see who in the image is a fidgeter.
This section is also interactive!