Rip that camera apart

So you want to understand the inner workings of a camera huh? There are many blogs [¹, ², ³] out there which present the required contents and which I myself referred to initially. Well, they are not rigorous in a mathematical sense. Before you go ahead any further I would strongly recommend you go through [¹] and come back here for a rigorous mathematical proof.

What I would like to write here is to give rigorous mathematical proof for dissecting the camera matrix. So, without any further ado let’s dive in.

Camera matrix \(P\) can be represented as product of intrinsic matrix \(K\) and extrinsic matrix \(E\) i.e, \(P = KE\) which are both \(3 \times 4\) matrices. (Some sources do append \([0, 0, 0, 1]^T\) to make it a square matrix). Precisely,

\[K = \begin{bmatrix} f_x & 0 & c_x\\ 0 & f_y & c_y\\ 0 & 0 & 1 \\ \end{bmatrix} \quad and \quad E = \begin{bmatrix} R \,| -RC \\ \end{bmatrix},\]

where columns of \(R\) represent rotations of world axis w.r.t camera and \(C\) is camera center in world. To sum up,

\[\begin{align} P & = KE \\ & = K[R \,| -RC] \\ & = [KR \,| -KRC] \\ & = [M \,| -MC] \end{align}\]

What we are most interested in is recovering \(K\) and \(R\) given \(P\). Note that once we obtain \(K\) and \(R\), getting \(C\) is trivial. Left multiplying last column of \(P\) by \(-M^{-1}\) gives \(C\).

Careful reader might notice that \(K\) is infact an upper triangular matrix and \(R\) is an orthogonal matrix. We apply RQ decomposition to recover the required matrices. Well, it’s that simple. Use some standard library and get the required decomposition, right? But the problem is RQ decomposition is not unique.

Rip that camera apart

References