Project 4

This project aims to stitch two pictures that were taken from the same camera view, similar to how panoramas are made by stitching together a series of images sweeping leftwards, rightwards, upwards, or downwards. Part A goes over warping images together based on manually selected correspondance points, while Part B aims to automate the selection of correspondance points.

Part A

1. Gathering Pictures

For my image selection, I chose to photograph the interior of Berkeley’s Anchor house, the exterior of Li Ka Shing Center, and the interior of our lecture hall. All of the photos were taken in account for the camera positioning, following Prof. Efros’s suggestion to rotate our phones based on the camera’s actual position.

For the rectifications, I chose some random pictures I took a while back of the Bart and a poster in the MTA.

Left Set

Exterior of Li Ka Shing Center, right image

Right Set

Exterior of Li Ka Shing Center, left image

Interior of Li Ka Shing 245, right image

Images for Rectification

2. Recovering Homographies

In order to stitch two images of a panorama together, we have to transform and warp one of them so that the overlapping features in both images align. In order to do so, we must find some homography using the two images and their correspondance points:

$\begin{bmatrix} x' / \lambda \\ y' / \lambda \\ \lambda \end{bmatrix} = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix}. \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}$

which simplifies to

$\begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -xx' -yx' \\ 0 & 0 & 0 & x & y & 1 & -xy' -yy' \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \\ 1 \end{bmatrix} = \begin{bmatrix} x' \\ y' \end{bmatrix}$

Which we can solve by plugging in our correspondant points into the coefficient matrix.

Example of correspondance points: Below we can see that the points in both images correspond to the same window in the interior of the Anchor House. We can use these points to calculate a homography between the two sets of points.

3. Warping Images (rectification)

Given we have found a homography $H$ , we can now warp an image such that its correspondant points match a desired result. This is simply a matter of finding the new bounding box of the warped image, shifting everything to match this new bounding box, then sampling from the original image at inverse-warped points. To get the bounding box, we simply take the corners of the image and translate them with $H$ , and we take the min and max of the width and height. We then shift everything by -min_width and -min_height. We can then sample using scipy.interpolate.griddata at all the points in the new bounding box, and we sample the image at those points warped by $H^{-1}$ . This should result in a warped image as seen below.

MTA Pictures

The rectified image appears flat, and the poster is square, however it’s a little blurry because we essentially stretched the left side of our original image, and also I think I was in a rush when I took this photo, so the original image is slightly blurry.

Bart Pictures

One small detail about these photos is that it seems more like a rotation of the image, since the original image is sorta already flat, but even then, the slight rotation causes the tilted screen to be cut off, so the text “OK” is still not aligned well.

To fix this, I tried setting the correspondant points to the OK sign’s border.

4. Blending Images

In order to create a mosiac, we have to stitch the warped image with the other image we wanted to warp to. This involves creating a new canvas with both images translated such that their correspondant points match up. Then after, we need to blend the images together so that there aren’t any harsh edges.

Alignment

For alignment, I simply got the values that were used to shift the bounding box, and I used those values to shift both images if needed.

Blending

For blending, I chose to use the method from Project 2 and blend the two images together using multi-resolution blending which involves blending the laplacian stacks of the two images based onthe gaussian stack of the given mask. For the mask, I simply subtracted the center image’s mask from the side image’s mask.

Adding back lost information

I noticed that some of the edges of the image were faded, so I added back the original images in the non-overlapping sections of the mask. This was my final result:

Final product, when only applying the multi-resolution blending to the intersection of the masks

More results

Exterior of Li Ka Shing Center; We can see there is a small part where the colors of the building clearly fade, but it’s not too harsh

Part B: Automatic Correspondance Detection

Rather than manually selecting correspondance points between images, this next part aims to automate this process through feature detection and matching.

Detecting corner features

To find points of interest for our correspondance points, we used Harris corners. This simply involved computing all the harris corners, and taking the corners with the strongest corner strengths. Below is a depiction of the image, the top corners of a test image of a long exposure of the SF Salesforce Tower, and a slight streak of a plane. We can see that the detected points of interest are at portions of the image where the corner strength is high.

Harrison matrix showing the strengths at each part of the image.

ANMS

When choosing interest points, we might want to choose points that aren’t too close to each other, so to do this, we use Adaptive Non-Maximal Suppression (ANMS). At a high level, this involves supressing interest points that are close together. The high level algorithm involves iterating through each point, calculating a minimum supression radius for each point (based on distance to a stronger/similar-strength point), then choosing the top points based on the largest radii.

Below, we can see that the corner selection is much spread out when ANMS is applied.

Extracting Feature Descriptors

When choosing correspondance points, we have to find someway to match points in the two images, but in order to match, we need some criteria to compare points in two different images. In this implementation, I used a simple 8x8 patch that sampled from a specified window around a given point (I ended up goin with a 80x80 window since that seemed to work better than a 40x40 window). I also normalized each patch so that the means is 0 and the standard deviation is 1 to make features invariant to changes between the two images.

Below are 5 points from the Salesforce Tower image and some corresponding feature descriptors.

Matching feature descriptors

With this mechanism, we can take the corresponding feature descriptors and match poitns based on comparrisons between descriptors. The basic algorithm at a high level consists of iterating through one list of descriptors, computing the distances between that descriptor and the rest of the descriptors, and choosing the closest descriptor. However, sometimes there might be innacurate matches if some feature has multiple similar features, so to work around this, we use Lowe’s technique of making sure the second closest descriptor is not similar enough: nearest_dist / second_nearest_dist < lowe_thresh.

Below are some results before implementing ANMS. We can see that matching pairs have similar-looking descriptors, and also belong in somewhat similar spots.

Here are some more matches after I implemented ANMS:

These points still are not sufficient enough, since there are still outliers, resulting in odly stiched images as seen below).

The resulting stich using the points from above.

RANSAC to exclude outliers

There are still inaccurate matchings in the above results, typically outliers in the edges of the images that don’t have corresponding points, so we use RANSAC to exclude such outliers in our final correspondance selection.

This involved randomly sampling four feature pairs, computing the homography, warping our set of features, and computing the error of those warped features for each sample. Then we keep the sample with the largest number of inliers, that is points that were warped into their correct positions with some margin of error epsilon. We then take the largest set of inliers and compute the homography and use that homography to warp our image.

Below is a resulting set of inliers that using 500 random samples, and we can clearly see that many erroneous points from the previous set were not incuded.

Here is the final result:

Original Manually chosen correspondance points

A Small Error

I noticed that sometimes, the algorithm chose points in the sky that aligned with other points in the sky, and it caused really buggy correspondance points, as seen above. To fix this, instead of feeding the raw harris corneres into AMNS, which resulted in corners spread apart throughout the whole image, I filtered them out to the top 1000 corners, so that my AMNS algorithm could only select corners that are actually distinguishable (not sky).

All the sky points from the first imageconvergred to the same point in the second image.

The Other 2 Final Results

Below are two more mosiacs as well as the automatically selected correspondance points, as well as the original manual stitches for comparison.