Week 11 - Orthogonal Projection, Low Rank Approximation, and Orthogonal Bases #

Rank-k Approximation #

Projecting a Vector onto a Subspace #

Here, we have two vectors, $a, b \in \mathbb{R}^m$. They exist in the plane deﬁned by $\text{Span}({a, b})$ which is a two dimensional space (unless a and b point in the same direction).
$b = z + w$
$z = \chi a \text{ with } \chi \in \mathbb{R}$
$a^T w = 0$
$0 = a^T w = a^T(b - z) = a^T (b - \chi a)$
$a^T a \chi = a^T b$.
Provided $a \ne 0$, $\chi = (a^T a)^{-1}(a^T b)$.
Thus, the component of $b$ in the direction of $a$ is given by
- $$z = \chi a = (a^T a)^{-1} (a^T b) a = a(a^T a)^{-1}(a^T b) = [a(a^T a)^{-1}a^T ] b = [\frac{1}{a^T a} a a^T ] b$$
- Notice $(a^Ta)^{-1}$ and $a^T b$ are both scalars.
- We say that, given vector $a$, the matrix that projects any given vector $b$ onto the space spanned by $a$ is given by
  - $$a(a^T a)^{-1}a^T = \frac{1}{a^T a} a a^T$$
The component of $b$ orthogonal (perpendicular) to $a$ is given by
- $$w = b - z = b - (a(a^T a)^{-1}a^T ) b = Ib - (a(a^T a)^{-1}a^T )b = (I - a(a^T a)^{-1}a^T )b$$
- We say that, given vector $a$, the matrix that projects any given vector $b$ onto the space spanned by $a$ is given by
  - $$I - a(a^T a)^{-1}a^T = I - \frac{1}{a^T a} a a^T$$
Set $v^T = (a^T a)^{-1}a^T$,
- $z = (a v^T) b$
- $w = (I - a v^T) b$
- Notice $(I - a v^T)$ is a rank-1 update to the identity matrix.
Given $a, x \in \mathbb {R}^ m$, we can use $P_ a( x )$ and $P_ a ^{\perp}( x )$ to represent the projection of vector $x$ onto ${\rm Span}({ a} )$ and ${\rm Span}({ a} )^{\perp}$.
Given $A \in \mathbb{R}^{m \times n}$ with linearly independent columns and vector $b \in \mathbb{R}^m$ :
- Component of $b$ in $\mathcal{C}(A)$: $$u = A(A^TA)^{-1} A^Tb$$
- Matrix that projects onto $\mathcal{C}(A)$: $$A(A^TA)^{-1}A^T$$
- Component of $b$ in $\mathcal{C}(A)^{\perp} = \mathcal{N}(A^T)$: $$w = b - A(A^TA)^{-1}A^T b = (I - A(A^TA)^{-1}A^T) b$$
- Matrix that projects onto $\mathcal{C}(A)^{\perp} = \mathcal{N}(A^T)$: $$(I - A(A^TA)^{-1}A^T)$$

Rank-k Approximation #

“Best” rank-k approximation of $B \in \mathbb{R}^{m \times n}$ using the column space of $A \in \mathbb{R}^{m \times k}$ (pick $k$ columns in B to get A) with linearly independent columns: $$A(A^TA)^{-1}A^TB = AV, \text{ where } V = (A^TA)^{-1}A^TB$$
- To calculate $V = (A^TA)^{-1}A^TB$
- First way is, to use LU factorization:
  - $(A^TA)V = (A^TA)(A^TA)^{-1}A^TB$
  - $(A^TA)V = A^TB$
  - solve $C = A^TA$ and $Y = A^TB$ separately, then solve $C V = Y$ by LU factorization.
- Second way is, to use Cholesky factorization:
  - $(A^TA)V = A^TB$
  - Since $A^TA$ is a symmetric positive deﬁnite(SPD) matrix. Then, we can transfer it to $LL^T = A^TA$
  - $LL^TV = A^TB$
  - set $U = L^TV$
  - solve $Y = A^TB$
  - Then solve $LU = Y$, to get $U$.
  - solve $L^TV = U$, to get $V$.

Orthonormal Bases #

Orthonormal Vectors #

Definition: Let $q_0, q_1, \ldots, q_{k-1} \in \mathbb{R}^m$. Then these vectors are (mutally) orthonormal if for all $0 \le i,j < k$: $$q_i^T q_j = \begin{cases} 1 & \text{ if } i = j \\ 0 & \text{ otherwise. }\end{cases}$$
- $\lVert q_i \rVert_2 = 1$
- $q_i$ is orthogonal to ${q_0, q_1, \ldots, q_{i-1}, q_{i+1}, \ldots, q_{m}}$.

Gram-Schmidt orthogonalization (GS orthogonalization) #

Definition: Transform a given set of basis vectors into a set of orthonormal vectors that form a basis for the same space is called GS orthogonalization.
Starting with linearly independent vectors $a_0, a_1, \ldots, a_{n-1} \in \mathbb{R}^m$, the following algorithm computes the mutually orthonormal vectors $q_0, q_1, \ldots, q_{n-1} \in \mathbb{R}^m$ such that $\text{Span}({a_0, a_1, \ldots, a_{n-1}}) = \text{Span}({q_0, q_1, \ldots, q_{n-1}})$:
The algorithm:

The QR factorization #

Given $A \in \mathbb{R}^{m \times n}$ with linearly independent columns, there exists a matrix $Q \in \mathbb{R}^{m \times n}$ with mutually orthonormal columns and upper triangular matrix $R \in \mathbb{R}^{n \times n}$ such that $A = QR$.
If one partitions
then
and Gram-Schmidt orthogonalization (the Gram-Schmidt process) in the above algorithm computes the columns of Q and elements of R.

Solving the linear least-squares problem via the QR factorization #

Given $A \in \mathbb{R}^{m \times n}$ with linearly independent columns, there exists a matrix $Q \in \mathbb{R}^{m \times n}$ with mutually orthonormal columns and upper triangular matrix $R \in \mathbb{R}^{n \times n}$ such that $A = QR$. The vector $\hat{x}$ that is the best solution (in the linear least-squares sense) to $Ax \approx b$ is given by
- $\hat{x} = (A^T A)^{-1} A^T b$ (as shown in Week 10) computed by solving the normal equations $$A^TAx = A^T b$$
- $\hat{x} = R^{-1} Q^T b$ computed by solving $$Rx = Q^Tb$$
  - Notice $Q^T Q = I$ and $R$ is upper trianglar.
  - And Columns of A must be linear independent.
An algorithm for computing the QR factorization is given by

Change of Basis #

A vector $b \in \mathbb{R}^m$ and a matrix $Q \in \mathbb{R}^{m \times n}$ with mutually orthonormal columns.
$Q^T Q = Q Q^T = Q^{-1} Q = Q Q^{-1} = I$
$b = Q Q^T b = \left(\begin{array}{c|c|c}q_0 & q_1 & \cdots & q_{n-1}\end{array}\right)\left(\begin{array}{c}q_0^T \\ q_1^T \\ \vdots \\ q_{n-1}^T\end{array}\right) b = q_0^T b q_0 + q_1^T b q_1 + \ldots + q_{i-1}^T b q_{i-1}$
- $q_0^T b q_0 = q_0 q_0^T b$ because $q_0^T b$ is scalar.
- notice that each of these terms is just a component of the vector $b$ in the direction of the given basis vector.

Singular Value Decomposition #

Any matrix $A \in \mathbb{R}^{m \times n}$ can be written as the product of three matrices, the Singular Value Decomposition (SVD): $$A = U \Sigma V^T$$ where
- $U \in \mathbb{R}^{m \times r}$ and $U^T U = I$ (U has orthonormal columns).
- $\Sigma \in \mathbb{R}^{r \times r}$ is a diagonal matrix with positive diagonal elements that are ordered so that $\sigma_{0,0} \ge \sigma_{1,1} \ge \ldots \ge \sigma_{(r-1),(r-1)} > 0$.
- $V \in \mathbb{R}^{n \times r}$ and $V^T V = I$ (V has orthonormal columns).
- $r$ equals the rank of matrix $A$.
If we partition
where $U_L$ and $V_L$ have $k$ columns and $\Sigma_{TL}$ is $k \times k$, then $U_L \Sigma_{TL} V_L^T$ is the “best” rank-k approximation to matrix B. So, the “best” rank-k approximation $B = AW^T$ is given by the choices $A = U_L$ and $W = \Sigma_{TL} V_L$.
- Given $A \in \mathbb{R}^{m \times n}$ with linearly independent columns, and $b \in \mathbb{R}^m$ , the “best” solution to $Ax \approx b$ (in the linear least-squares sense) via its SVD, $A = U \Sigma V^T$ , is given by
  - $$\begin{aligned}\hat{x} &= (A^TA)^{-1}A^T b \\ &= ((U \Sigma V^T)^T U \Sigma V^T)^{-1} (U \Sigma V^T)^T b \\ &= V \Sigma^{-1} U^T b \end{aligned}$$