Background
Sergei Mikhailovich Prokudin-Gorskii (1863-1944) [Сергей Михайлович Прокудин-Горский, to his Russian friends] was a man well ahead of his time.
Convinced, as early as 1907, that color photography was the wave of the future, he won Tzar's special permission to travel across the vast Russian Empire
and take color photographs of everything he saw including the only color portrait of Leo Tolstoy. And he really photographed everything: people, buildings,
landscapes, railroads, bridges... thousands of color pictures! His idea was simple: record three exposures of every scene onto a glass plate using a red,
a green, and a blue filter. Never mind that there was no way to print color photographs until much later -- he envisioned special projectors to be installed
in "multimedia" classrooms all across Russia where the children would be able to learn about their vast country.
Alas, his plans never materialized: he left Russia in 1918, right after the revolution, never to return again. Luckily, his RGB glass plate negatives,
capturing the last years of the Russian Empire, survived and were purchased in 1948 by the Library of Congress.
The LoC has recently digitized the negatives and made them available on-line.
The goal of this project is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce
a color image with as few visual artifacts as possible. We will extract the three color channel images, place them on top of each other, and align them
so that they form a single RGB color image.
Examples of Prokudin-Gorskii's glass plates
Approach
To reconstruct the images, we begin by separating the digitized glass plates into their red, green, and blue channels. Each channel captures light through a different filter during the original exposure. We match the green and red images to the blue image, and then place all of the images on top of one another to re-create the color photograph.
To align the channels accurately, I used the Normalized Cross-Correlation (NCC) metric, which evaluates how closely two images match. NCC is calculated as the dot product between two normalized vectors: one representing the normalized values of the first image (image1./||image1||) and the other for the second image (image2./||image2||). The closer the vectors, the better the alignment.
To avoid introducing noise into the alignment process, I cropped the images by 5% on each side. This step removed the black and white borders typically present in the glass plates, ensuring that these borders didn’t skew the NCC metric.
For smaller images, I performed an exhaustive search over a 30x30 window of possible displacements. I evaluated each possible displacement using the NCC metric and selected the displacement with the highest score for the best alignment. However, for larger images, this method became computationally expensive. Larger images not only have more pixels but also require larger displacements, slowing down the alignment process significantly.
To solve this problem, I employed an image pyramid. The image pyramid starts by determining how many levels of scaling (or pyramid layers) are needed to go from a 30 pixel wide image to the full size one, doubling the image size each time. It then iteratively aligns the images from the smallest scale to the original size. For each level, it rescales both images and computes the optimal displacement (shift in x and y directions) between them. This optimal displacement is then scaled up as the function moves to higher-resolution levels. At each stage of the pyramid, we only need to check displacements within a small range of +/- 3. By starting alignment at a coarse scale and refining it at progressively higher resolutions, the pyramid approach efficiently handles larger displacements, speeding up the alignment process.
Self portrait of Prokudin-Gorskii. Left is the original glass plate. Right is the reconstructed color image.