Norman Karr Prokudin Gorskii Filters and Frequencies

 Cs194-26 Project 1

Colorizin​g the Prokudin-Gorskii Collection

Norman Karr | nkarr11@berkeley.edu

Overview

In 1907, a man by the name of Sergey Mikhailovich Prokudin-Gorskii took three exposures of every single scene he photographed. One exposure had a red filter, one had a green filter, and the last one had a blue filter. Together, three three images represent the activations of each color channel. The goal of this project was to create a automated pipeline that would combine the three exposures of each image to generate a vibrant, colorized image of each photograph. 

Table of Contents

Automatic Alignment

1. Alignment Metrics
2. Pyramid Search
4. Edge Alignment
5. Further Endeavors

Automatic Effects

1. Smart Cropping
2. Sharpening
3. Contrast
4. Further Endeavors

Results

1. Final Images
2. Offset Values
3. Fun Results

Automatic Alignment

Let's turn three grayscale images into one color image. That doesn't sound crazy at all!

1. Alignment Metrics

The natural first step in figuring out how to align images is to come up with a numerical metric that can measure how well aligned two images are. Fortunately, there are two relatively simple solutions to this. The first metric is a pixelwise L2 norm, or sum of squared differences (SSD) which is calculated by taking the pixelwise difference between two images and returning the squared norm of the result. The second metric is a normalized cross-correlation (NCC) which is just a dot product between two normalized images.* In my experiments, I found that NCC produced better results than SSD the majority of the time but took twice as long to calculate.

*To increase consistency, I decided to use a negative NCC so that both of my metrics become minimization problems.

Reference Channel

Unfortunately, both of these metrics are made to compare two objects, not three. As a result, we have to treat one color channel as a reference channel and then use the metrics to align the other two channels to our reference. For example, if the red channel is our reference, we will align the blue to the red and then align the green to the red. Interestingly, selecting different references actually results in different alignments. After some experiments, I found that using the green channel as the reference channel produced the most consistent results.

Red as reference

Green as reference

Middle Crop Evalutation

One shortcoming of blindly applying the metrics is the pixel instability in the borders of each image. Since the borders are so irregular, these areas of the image would contaminate the accuracy of the metrics. To avoid the borders, I employed a crop of each image before calculating the alignment accuracy. For each image, I evaluate the metrics only on the middle 60% of each image.

2. Pyramid Search

To test different alignments, I exhaustively iterate through a [-10,10] pixel offset range in both the x and y direction. This strategy is sufficient for small images but fails for larger images because the true offset can be greater than ±10 pixels. Unfortunately, this exhaustive search becomes way too slow when we increase the range to say [-100,100]. I solved this issue by employing an algorithm known as pyramid search.

Pyramid Search Algorithm

  1. Create multiple compressions of the original image until our most compressed image is smaller than 200x200 pixels (I employ a compression factor of 2).
  2. Find the best x and y offset within the range [-10,10] for the most compressed image.
  3. Iterate to the next most compressed image and multipy the previously found offsets by the compression factor to get newly scaled offsets (This is because scaling up our image means our offsets need to be scaled up too) 
  4. Then search within a range of [-10,10] from the scaled offsets to find the best offsets for the new image
  5. Repeat steps 3 and 4 until we have calculated the optimal offset for the original image

3. Edge Alignment

One obvious issue with directly aligning pixel activations is that not all parts of an image will have the same activation in each channel. If this was the case, then our colored image would be actually black and white. To increase alignment robustness, I opted to perform alignments based on edges in the image instead. To do this, I convolved each channel with an edge detection kernel and then aligned each image based off of the edge detection activations.

Raw Image Red Channel

Edge Detector Activations Red Channel

4. Future Endeavors in Alignment

One shortcoming is that the alignment is currently limited to a x and y offset. Possible explorations could include adding a search through different scales, rotations, and sheers to further improve alignment. To employ a search through more dimensions requires significant improvement to my algorithms' speed.

Another specific thing I would like to explore is a more intelligent search other than exhaustive searching. One idea I had was calculating some sort of gradient of SDD or NCC with respect to my parameters. This way I can more intelligentally approach optimal values with gradient descent as opposed to exhaustively searching through a space.

Automatic Effects

Now that our images are aligned and in color, how can we make the images better?

1. Smart Cropping

In each of our aligned photos, there is often a border that is pure black, pure white, or some other extreme color. I wanted to find an intelligent way to crop the images dynamically such that we only see the clean, center image. For this, I employed a custom algorithm to crop out these unwanted borders.

Autocrop

  1. Pick out a left-aligned slice of the middle row and a right-aligned slice of the middle row across all three channels. Perform a similar operation for middle column
  2. Convolve these slices with the 1D edge detection kernel [-1,0,1]
  3. Find the index of the greatest activations across each color channel and pick the indices that are most close to the center.
  4. Crop the image using these indices and repeat the process two more times

*This process is run 3 times because three color channels means we can have three possible edges on each side. There are some cases where we can get away with only 2 iterations but 3 iterations gaurantees a valid crop.

Original

Iteration 1

Iteration 2

Iteration 3

2. Automatic Sharpening

The next effect in my pipeline is a sharpening function which I perform by simply convolving every image with a sharpening kernel. This function only works for larger images and although minor, the difference is noticable when viewing the .tif files. After compressing to a jpeg, the difference is even less noticable however if you look very closely, you can still tell that some edges became more sharp. Regardless, automatic sharpness operates on each image that passes through my pipeline. 

Before Sharpening

After Sharpening

3. Contrast

The most important tool for automating contrast is the color channel histograms. Using these histograms, I tried three different constrasting methods: histogram equalization, adaptive equalization, and contrast stretching. Histogram equalization and adaptive equalization produced very cool images but neither produced realistic images (these images were very cool so I included some in the appendix). On the other hand, contrast stretching produced great images when tuned correctly.

Contrast Stretching

Contrast Stretching works by normalizing the a specified range of values rather than the entire range of values. Unlike, histogram equalization, this method is nonlinear and is able to generate a less harsh enhancement of the image. Pictured below are the images with varying limits to the normalizing range:

4. Future Endeavors

The next feature that I would build would be a saturation controller. I think it would be interesting to experiment with different color mappings or specific hue intensities. Building a saturation and hue adjustment function, then I could manually change these photos to have more realistic or artistic styles. For example, in the photo of Emir, I would like to grayscale everything except for his blue robe.

Results

All our pipelining is done, let's see what we got! I highly suggest scrolling to the bottom as well.

Finalized Image Gallery

Offset Values

Image Red Channel Offset (x,y) Blue Channel Offset (x,y)
cathedral.jpg (1,7) (-2,-5)
monastery.jpg (1,6) (-1,3)
tobolsk.jpg (1,4) (-1,-2)
church.tif (-8,33) (-4,25)
emir.tif (17,57) (-23,49)
harvesters.tif (-2,64) (-17,-59)
icon.tif (6,48) (-16,-39)
lady.tif (-25,-116) (-115,-133)
melons.tif (4,96) (-9,-80)
onion_church.tif (10,58) (-25,-51)
self_portrait.tif (8,97) (-29,-78)
three_generation.tif (-2,58) (-13,-53)
train.tif (27,44) (2,-53)
workshop.tif (-11,52) (2,-53)

Results of Histogram Equalization

Results of Adaptive Equalization