CS 180: Computer Vision, Fall 2024
Project 1: Images of the Russian Empire
Colorizing the Prokudin-Gorskii photo collection
Rohan Gulati
SID: 3037864000
Overview
In the early 1900s, Sergei Mikhailovich Prokudin-Gorskii foresaw color pictures and took stills using red, green, and blue color filters. In this project, I learned to develop multiple image processing techniques from scratch and get practice with image analysis libraries for the purpose of historical recreation from pictures captured over 100 years ago.
For preprocessing, I split the stacked files into red, green, and blue matrices, cutting off 10% of the image on all 4 sides for each channel. The three color channels of the image needed to be aligned, so I anchored the blue one and found displacements of the green and red channels relative to it. While a scan over all displacements would yield the optimal solution, this method would take significant amount of time, so I also included a pyramid scaling approach where I
aligned downscaled pictures quickly to narrow down the region of interest in the fully-sized image before locating the optimal displacement.
For each image, the displacement of the red and green color channels relative to the blue channel are displayed in (x,y). Additionally, the processing time for each image is displayed as well.
Single-Scale Images
cathedral.jpg, red = (13, 12), green = (2, 5), time: 0.84s
|
monastery.jpg, red = (2, 3), green = (2, -3), time = 0.58s
|
tobolsk.jpg, red = (3, 6), green = (3, 3), time = 0.59s
|
|
The first naive approach to aligning the channels was comparing every displacement within a [-15, 15] window on the x and y axes to find which displacement best aligned with the blue base layer using the numpy.roll() function.
To compare the two displacements, I computed the Euclidian distance between the light intensities at the two channels or simply the matrix sum of the squared difference of the corresponding pixels between the two images.
Since these images were roughly small and the true displacement was not too far from (0,0), this method could produce quality results in a small amount of time. However, when the correct displacement is farther away from (0,0) or when the images are huge, the runtime can blow up quadratically. As a result, a new efficient method is necessary.
Pyramid Scaling
To handle larger images with displacements between channels significantly far apart, I implemented a pyramid-scaling algorithm that downscales large images into a coarser resolution and locates the optimal displacement from the bottom-up.
By downscaling to a small image, we can narrow down the region where the optimal displacement exists and only search a small window at each increasing level for it, until we find it for the level 0 image. For emir.tif, computing the Euclidian distance
between the two channels' light intensities was not accurate. As a result, I computed the structural similarity index of the two images.This greatly sped up the time taken per-image, but it was still
taking around 12-14 minutes per large TIF. To further speed up the algorithm, I used Canny edge detection at each layer of the image pyramid. The pyramid algorithm worked the same, except the alignment at each layer used a
filtered binary version of the color channel that isolated the edges within the image, although the downscaling was still done on the unfiltered image. Additionally, instead of computing the Euclidian Distance or phase correlation, I used an XOR on the binary Canny-edge matrices,
which greatly reduced the runtime to roughly a minute for most images.
church.tif, red = (-4, 58), green = (3, 25), time = 47.02s
|
emir.tif, red = (40, 107), green = (23, 49), time = 47.4s
|
harvesters.tif, red = (15, 123), green = (18, 60), time = 59.38s
|
|
icon.tif, red = (22, 88), green = (16, 38), time = 1:00.81s
|
lady.tif, red = (13, 120), green = (9, 57), time = 1:02.26s
|
|
melons.tif, red = (14, 177), green = (9, 79), time = 1:01.7s
|
onion_church.tif, red = (35, 108), green = (27, 51), time = 1:01.36s
|
|
self_portrait.tif, red = (37, 175), green = (29, 77), time = 1:08.5
|
three_generations.tif, red = (8, 111), green = (12, 55), time = 59.43s
|
|
train.tif, red = (29, 85), green = (0, 40), time = 1:00.96s
|
|
Bells & Whistles
For bells and whistles, the two primary changes I implemented on top of the pyramid-scaling was Canny edge detection and automatic color contrasting.
Canny edge detection uses Gaussian smoothing and a Sobel filter to capture and isolate the edges of an image, resulting in a visually-similar binary representation of the image.
With this new representation, scoring a displacement becomes extremely efficient -- only an XOR operation is needed across the matrices. This reduced the runtime of each image process from
12 minutes to 50 seconds on average. Regarding implementation, I applied the algorithm before aligning each channel within the image pyramid, while still using the original
image channel for downscaling.
sculpture.tif, blue channel
|
sculpture.tif, blue channel canny-edge filtered - sigma = 2
|
Next, I implemented automatic color contrasting. For each channel, I would find the pixel of the lowest light intensity and treat it as 0 by subtracting its value from all of the pixels.
To finish the normalization, I would find the pixel of brightest value and divide every pixel in the image by its value, thereby upperbounding the light intensity at 1. Now, each channel ranges
from 0 to 1, although the effect on alignment after canny edge filtering is negligible.
No Canny Edge Filtering + Structural Similarity Scoring, sculpture.tif - time = 12:46.48 minutes
|
Canny Edge Filtered + XOR scoring, sculpture.tif - time = 50.79s
|
No Contrasting, sculpture.tif
|
Contrasted, sculpture.tif
|
Extra Pictures
tree.tif, red = (43, 56), green = (29, 30), time: 1:03.89
|
cabin.tif, red = (-4, 123), green = (-1, 37), time = 1:08.17
|
lakemountain.tif, red = (-29, 93), green = (-17, 41), time = 1:02.05
|
|