Week 4 - Matrix-Vector to Matrix-Matrix Multiplication #

Preparation #

$$\displaystyle \left( \begin{array}{ c | c | c | c } A_{0,0} & A_{0,1} & \cdots & A_{0,N-1} \\ A_{1,0} & A_{1,1} & \cdots & A_{1,N-1} \\ \vdots & \vdots & \ddots & \vdots \\ A_{M-1,0} & A_{M-1,1} & \cdots & A_{M-1,N-1} \end{array} \right) \left( \begin{array}{ c } x_0 \\ x_1 \\ \vdots \\ x_{N-1} \end{array} \right) = \left( \begin{array}{ c } A_{0,0} x_{0} + A_{0,1} x_{1} + \cdots + A_{0,N-1} x_{N-1} \\ A_{1,0} x_{0} + A_{1,1} x_{1} + \cdots + A_{1,N-1} x_{N-1} \\ \vdots \\ A_{M-1,0} x_{0} + A_{M-1,1} x_{1} + \cdots + A_{M-1,N-1} x_{N-1} \end{array} \right)$$
Two different algorithms to calculate Matrix-vector Multiplication
- First one, is calculating by rows.
  - $\psi_1 := a_{10}^T x_0 + \alpha_{11} x_1 + a_{12}^T x_2 + \psi_1$
- Second one is by columns.
  - $y_0 := \chi_1 a_{01} + y_0$
  - $\psi_1 := \chi_1 \alpha_{11} + \psi_1$
  - $y_2 := \chi_1 a_{21} + y_2$

$$\left( \begin{array}{c | c | c | c} A_{0,0} & A_{0,1} & \cdots & A_{0,N-1} \\ A_{1,0} & A_{1,1} & \cdots & A_{1,N-1} \\ \vdots & \vdots & & \vdots \\ A_{M-1,0} & A_{M-1,1} & \cdots & A_{M-1,N-1} \end{array} \right)^ T = \left( \begin{array}{c | c | c | c} A_{0,0}^ T & A_{1,0}^ T & \cdots & A_{M-1,0}^ T \\ A_{0,1}^ T & A_{1,1}^ T & \cdots & A_{M-1,1}^ T \\ \vdots & \vdots & & \vdots \\ A_{0,N-1}^ T & A_{1,N-1}^ T & \cdots & A_{M-1,N-1}^ T \end{array} \right).$$
Example

When doing calculation, we always concern flops and memops. And later we will prove that, memops is much slower that flops.
Let’s take an example. If we want calculate $y := A^T x + y$. Normally, we will do, set $B = A^T$, then $y := B x +y$
- Which means, we are going to spend the most time to transpose matrix A to B, and little time computing.
- What we want is, compute with $A^T$ without actually transposing.
- The solution is, we can simply use columns of A for the dot products in the dot product based algorithm for $y := Ax + y$.

We purposely chose the matrix on the right to be symmetric. We notice that $a_{10}^T = a_{01}$ , $A_{20}^T = A_{02}$ , and $a_{12}^T = a_{21}$.
So we just need to change the step of calculation
- By rows:
  - from $\psi_1 := a_{10}^T x_0 + \alpha_{11} x_1 + a_{12}^T x_2 + \psi_1$
  - to $\psi_1 := a_{01} x_0 + \alpha_{11} x_1 + a_{12}^T x_2 + \psi_1$
- By columns:
  - from
    - $y_0 := \chi_1 a_{01} + y_0$
    - $\psi_1 := \chi_1 \alpha_{11} + \psi_1$
    - $y_2 := \chi_1 a_{21} + y_2$
  - to
    - $y_0 := \chi_1 a_{01} + y_0$
    - $\psi_1 := \chi_1 \alpha_{11} + \psi_1$
    - $y_2 := \chi_1 (a_{11}^T)^T + y_2$

Let $L_A: \mathbb {R}^ k \rightarrow \mathbb {R}^ m$ and $L_ B: \mathbb {R}^ n \rightarrow \mathbb {R}^ k$ both be linear transformations and, for all $x \in \mathbb{R}^n$, define the function $L_C: \mathbb {R}^ n \rightarrow \mathbb {R}^ m$ by $L_ C( x ) = L_A( L_ B( x ) )$. Then $L_ C( x )$ is a linear transformations.

$$A B = A \left( \begin{array}{c | c | c | c } b_0 & b_1 & \cdots & b_{n-1} \end{array} \right) = \left( \begin{array}{c | c | c | c } A b_0 & A b_1 & \cdots & A b_{n-1} \end{array} \right).$$
If
- $$C = \begin{bmatrix} \gamma_{0,0} & \gamma_{0,1} & \cdots & \gamma_{0,n-1} \\ \gamma_{1,0} & \gamma_{1,1} & \cdots & \gamma_{1,n-1} \\ \vdots & \vdots & \vdots & \vdots \\ \gamma_{m-1,0} & \gamma_{m-1,1} & \cdots & \gamma_{m-1,n-1} \\ \end{bmatrix},$$
- $$C = \begin{bmatrix} \gamma_{0,0} & \gamma_{0,1} & \cdots & \gamma_{0,n-1} \\ \gamma_{1,0} & \gamma_{1,1} & \cdots & \gamma_{1,n-1} \\ \vdots & \vdots & \vdots & \vdots \\ \gamma_{m-1,0} & \gamma_{m-1,1} & \cdots & \gamma_{m-1,n-1} \\ \end{bmatrix},$$
- $$\quad A = \begin{bmatrix} \alpha_{0,0} & \alpha_{0,1} & \cdots & \alpha_{0,k-1} \\ \alpha_{1,0} & \alpha_{1,1} & \cdots & \alpha_{1,k-1} \\ \vdots & \vdots & \vdots & \vdots \\ \alpha_{m-1,0} & \alpha_{m-1,1} & \cdots & \alpha_{m-1,k-1} \\ \end{bmatrix}$$
- $$\\ \text{and} \quad B = \begin{bmatrix} \beta_{0,0} & \beta_{0,1} & \cdots & \beta_{0,n-1} \\ \beta_{1,0} & \beta_{1,1} & \cdots & \beta_{1,n-1} \\ \vdots & \vdots & \vdots & \vdots \\ \beta_{k-1,0} & \beta_{k-1,1} & \cdots & \beta_{k-1,n-1} \\ \end{bmatrix}.$$
Then C = AB means that $\gamma_{i,j} = \sum_{p=0}^{k-1} \alpha_{i,p} \beta_{p,j}$
A table of matrix-matrix multiplications with matrices of special shape is given at the end of this unit.

Let $x \in \mathbb{R}^m$ and $y \in \mathbb{R}^n$. Then the outer product of x and y is given by $xy^T$. Notice that this yields an $m \times n$ matrix:
- $$\begin{aligned} xy^T &= \left( \begin{array}{c} \chi_0 \\ \chi_1 \\ \vdots \\ \chi_{m-1} \end{array} \right) \left( \begin{array}{c} \psi_0 \\ \psi_1 \\ \vdots \\ \psi_{n-1} \end{array} \right)^ T = \left( \begin{array}{c} \chi_0 \\ \chi_1 \\ \vdots \\ \chi_{m-1} \end{array} \right) \left( \begin{array}{c c c c} \psi_0 & \psi_1 & \cdots & \psi_{n-1} \end{array} \right) \\ &= \left( \begin{array}{c c c c} \chi_0 \psi_0 & \chi_0 \psi_1 & \cdots & \chi_0 \psi_{n-1} \\ \chi_1 \psi_0 & \chi_1 \psi_1 & \cdots & \chi_1 \psi_{n-1} \\ \vdots & \vdots & & \vdots \\ \chi_{m-1} \psi_0 & \chi_{m-1} \psi_1 & \cdots & \chi_{m-1} \psi_{n-1} \end{array} \right). \end{aligned}$$
The cost of memops of matrix-matrix multiplication is $2kmn$.