CS280A Project 1

Introduction

In 1907, Russian chemist Sergey Prokudin-Gorsky traveled the Russian Empire taking the worlds first color images. He used a black and white camera and took sets of 3 images, each through a blue, green, and red colored film. The results are the Prokudin-Gorskii Collection. He thought that these images would be reconstructed using a special projector to overlay the images on top of each other, but this plan didn't happen because he fled Russia during the Russian Revolution.

The goal for this project is to take digitized versions of his collection from the Library of Congress and reconstruct the color images by automatically cropping and aligning each image in the collection. The output will be 3 channel color images showing the original subject in all its color glory.

Figure 1: Tobolsk captured by Prokudin-Gorskii. The top images was capture through a blue filter, the middle image through a green filter, and the bottom image through a red filter.

Approach

To align the images, two problems must be solved. First, the borders around the images should be trimmed because they can affect the quality of the output. This is done with an automatic algorithm. Then the image is cut into thirds and the red and green channels are aligned to the blue.

The input dataset contains some low res (jpg) and some high res (tif) images. For a few of the algorithm features, different parameters were used for each set, and those will be explicitly called out.

Automatic Cropping

Each input image has a white border and then a black border that should be cropped in order to provide better output. One observation about the input images is that if you take a given column of pixels, if the majority are all white or all black, they are most likely a border because the content of the images is rarely all black or white. These borders are detected by starting at each edge of the image (left, right, top, bottom), and iterating inward one row or column at a time. If a row or column is mostly all white or mostly all black, they will be cropped.

The issue is that these borders are not always pure white (r: 255, r: 255, r: 255) or pure black (r: 0, g: 0, b: 0). This can be due to quantization error (noise) or aliasing if the image isn't perfectly square to the camera, some columns may have a partial white border and black border (See Figure 2 for examples). Two functions were implemented to help with this problem.

Thresholding

Some thresholding is applied to help segment the borders. Any pixel with a value below 75 is turned completely black, and any pixel value above 175 is turned completely white. This makes it easier to pick out the borders (See Figure 3).

    Threshold Values
    black = [0, 75)
    white = (175, 255]

A closeup of the Tobolsk image showing the two borders — Figure 2: A closeup of the Tobolsk image border. From left to right, the white border is visible, then the black, then the image data. Notice that a significant part of the black border is lighter than the rest.

The same closeup as Figure 2, but now with the thresholding applied — Figure 3: The same closeup of Tobolsk, but with the thresholding applied.

Look Ahead

There are some cases where a column would not be detected as a border while searching. A look ahead method was implemented to solve this. The way this works is that when a column is detect as not a border, the algorithm continues searching for a border for a few extra columns. If a border is detected during this look ahead period, the search continues at that point. If a border is not detected, then the original stopping point is used. Different look ahead values were used for the low and high res images.

    Look Ahead Values
    jpg = 2
    tif = 6

Tobolsk image with crop lines added — Figure 4: Tobolsk image with red lines drawn to show where the borders were cropped.

Image Alignment

Now that we have the borders cropped, we cut the image into thirds to get the final blue, green and red channel images. See Figure 5 for an example of these cropped images. Notice that the inner borders are visible in some channels, but that doesn't seem to have as much of an impact on the algorithm performance as having outer borders present.

Canny Edge Detector

The content in each of these images are correlated by are hard to match with an alignment because there are different intensities in each channel. One observation is that the borders between objects in the image stay the same despite the objects having different intensities in each image. To utilize this observation, each channel is processed with a Canny Edge Detector, which draws lines at the edges between shapes in the image.

The red and green images plus some offset are each compared to the blue image. A simple L2-Norm proved to be really effective on the Canny processed images, and was quite fast, so that was used to determine when two images were aligned. A window of possible offsets was used to compare the images in the range of [-15, 15] along each axis of the image.

Pyramid

For the larger images, it is computational quite expensive to search over the [-15, 15] window. To optimize this process a pyramid of different image resolution is computed. Each layer of the pyramid is 1/2 the resolution of the prior, and a pyramid is computed to a depth of 5 layers. This allows is to search over a much smaller window of [-5, 5], which is 200 fewer checks that need to be made (or about 1/10th the amount of computation, per layer).

    Search window
    jpg = [-15, 15]
    tif = [-5, 5]

Results

Each channel is then stacked with the various offsets and presented as a complete color image. Here are the result images for the input dataset, as well as the offsets computed for each channel in the format [x, y].

As a bonus here, each image can be clicked on to view the unaligned version. It's fun to be able to toggle back and forth to see the improvement!

**Cathedral**
R Offset: [3, 14]

G Offset: [2, -8]

**Church**
R Offset: [-4, -134]

G Offset: [4, -71]

**Emir**
R Offset: [40, -97]

G Offset: [23, -53]

**Harvesters**
R Offset: [11, -129]

G Offset: [10, -68]

**Lady**
R Offset: [13, -66]

G Offset: [10, -37]

**Melons**
R Offset: [11, -88]

G Offset: [9, -46]

**Monastery**
R Offset: [0, -11]

G Offset: [2, -13]

**Onion Church**
R Offset: [34, -113]

G Offset: [24, -58]

**Sculpture**
R Offset: [-27, -44]

G Offset: [-11, -59]

**Self Portrait**
R Offset: [37, -127]

G Offset: [29, -74]

**Three Generations**
R Offset: [7, -104]

G Offset: [11, -49]

**Tobolsk**
R Offset: [3, -15]

G Offset: [3, -8]

**Train**
R Offset: [28, -116]

G Offset: [2, -52]

Extra Items

In addition to the Automatic Cropping and Canny Edge Based Alignment as described earlier, here are a few other extra items that were implemented.

Additional Images

There are many images in the collection, so here are a few other examples from the collection aligned with the same algorithm.

**Finland**
R Offset: [-6, -49]

G Offset: [-1, -61]

**Lilacs**
R Offset: [-23, -252]

G Offset: [-8, -126]

**Ordezh River**
R Offset: [8, -49]

G Offset: [5, -59]

**Yurt**
R Offset: [49, -81]

G Offset: [32, -45]

What's with these people?

One observation is that alignment never looks great on images with people that are standing. One hypothesis is that this is because some of the people are moving in between pictures. It should be possible to build a small interface to toggle between each image image after alignment to see who in the image is a fidgeter.

This section is also interactive!