Algebraic Vision

Taken from a talk on 11/15/2024 by Jessie Loucks - Tavitas from Sac State.

1: What is Computer Vision?

Specifically from mathematicians' perspective. Two motivating questions:

  1. Given cameras (positions, angles) + images (color data), recover the object: Traingulation
  2. Given objects + images, recover the camera: Resectioning
You have two givens, can you find the third?

2: What is a pinhole camera?

This is the idea of the hole in a box projecting the world's image onto the backboard of the box. Mathematically, the camera is a map A:R3R2. Where (x,y,z)(xz,yz).

But notice that this map is not invertible, and if z=0 then we're toast! It's also non-linear, so we can't use a matrix for it.

The other thing is that in 3D space two parallel lines don't intersect; but their map via A may have this (think of the train tracks' lines intersecting at the horizon).

The fix...

3: Perspective & Projective Geometry

The idea is that we look at this via perspective geometry, where we actually have these converging lines. This is in contrast to orthographic geometry, where the 2D counterparts also are parallel if they are in the R3.

Projective space Pn where n=2,3

P3={(x:y:z:1)}{(x:y:z:0)}

You can think of the left set as R3 and the RHS as the Limit Points of R3. The rules are:

  1. (αx:αy:αz:αw)=(x:y:z:w) where α0.
  2. At least one non-zero coordinate (0,0,0,0)P3.

For example, if a line through 0 in R3 is:

l(t)=(xt:yt:zt:1)=(x:y:z:1t)limtl(t)=(x:y:z:0)

So the end point of this line is just this limit point (x:y:z:0).

For P2 it's the same idea, except with three coordinates instead of 4.

With projective space then we get linearity. The map A:P3P2 is a valid linear map and is:

A=[100001000010]

4: Triangulation and Resectioning

Triangulation

Say we have m cameras. Then we have a map A=(A1,,Am). A multiview configuration is a tuple of cmaeras, capturing multiple scene points. The multiview variety of A is:

ΓA,Pm,n:=im(A)

Here n is the number of world-points that we get images of. The idea is Γ is useful for reconstructing scene points.

Resectioning

This is the meat and potatoes here. A hypercamera configuration is a tuple of world points q1,,qn being captured by multiple cameras m of them. This recovers camera structure.

5: Duality

Like with the Chapter 3 (cont.) - Products and Quotients of Vector Spaces#3.F Duality, a lot of things have duality like Graphs!

This gets into Carlsson-Weinshall Duality.