# Kalman Filter

### Model of Kalman Filter

We assume the current state can be modeled as a Gaussian distribution

$$
P(z\_t|z\_{t-1}) \sim \mathcal{N}(Az\_{t-1},Q)
$$

We assume neural observations can also be modeled a Gaussian distribution

$$
P(x\_t|z\_{t}) \sim \mathcal{N}(Cz\_{t},R)
$$

We also assume abase case

$$
P(z\_{1}) \sim \mathcal{N}(\Pi,V)
$$

Thus the model parameters are:

$$
\Theta={A,Q,\Pi,V,C,R}
$$

### Model Training

We aim to maximize the joint likelihood of the state and observed date

$$
\mathcal{D}={{x}^n, {z}^n}*{n=1}^N={{x\_1^n, \dots, x\_T^n}, {z\_1^n, \dots, z\_T^n}}*{n=1}^N
$$

$$
\Theta^\* = \arg\max\_{\Theta} P({x}^n, {z}^n|\Theta)\\
\= \arg\max\_{\Theta} \prod\_{n=1}^N P(z^n\_1)\left( \prod\_{t=2}^T P(z^n\_t|z^n\_{t-1}) \right)\left( \prod\_{t=1}^T P(x^n\_t|z^n\_{t}) \right)\\
\= \arg\max\_{\Theta}\sum\_{n=1}^N \log P(z^n\_1) + \sum\_{t=2}^T \log  P(z^n\_t|z^n\_{t-1}) + \prod\_{t=1}^T \log P(x^n\_t|z^n\_{t}) \\
\= \arg\max\_{\Theta} \sum\_{n=1}^N -\frac{1}{2} \log|V|-\frac{1}{2}(z^n\_1-\Pi)^{\top}V^{-1}(z^n\_1-\Pi) + \sum\_{t=2}^T \left(-\frac{1}{2} \log|Q|-\frac{1}{2}(z^n\_{t}-Az^n\_{t-1})^{\top}Q^{-1}(z^n\_{t}-Az^n\_{t-1})\right) + \sum\_{t=1}^T \left(-\frac{1}{2} \log|R|-\frac{1}{2}(x^n\_{t}-Cz^n\_{t})^{\top}R^{-1}(x^n\_{t}-Cz^n\_{t})\right) \\
\= \arg\min\_{\Theta} \sum\_{n=1}^N  \log|V|+(z^n\_1-\Pi)^{\top}V^{-1}(z^n\_1-\Pi) + \sum\_{t=2}^T \left( \log|Q|+(z^n\_{t}-Az^n\_{t-1})^{\top}Q^{-1}(z^n\_{t}-Az^n\_{t-1})\right) + \sum\_{t=1}^T \left(\log|R|+\frac{1}{2}(x^n\_{t}-Cz^n\_{t})^{\top}R^{-1}(x^n\_{t}-Cz^n\_{t})\right)
$$

Suppose

$$
\mathcal{L}=\sum\_{n=1}^N \log|V|+(z^n\_1-\Pi)^{\top}V^{-1}(z^n\_1-\Pi) + \sum\_{t=2}^T \left( \log|Q|+(z^n\_{t}-Az^n\_{t-1})^{\top}Q^{-1}(z^n\_{t}-Az^n\_{t-1})\right) + \sum\_{t=1}^T \left(\log|R|+\frac{1}{2}(x^n\_{t}-Cz^n\_{t})^{\top}R^{-1}(x^n\_{t}-Cz^n\_{t})\right)
$$

the minimize is achieved when the derivative vanishes

$$
\nabla\_{\Pi} \mathcal{L} = \sum\_{n=1}^N -2 (z^n\_1-\Pi)^{\top}V^{-1} = 0 \to \Pi^\* = \frac{1}{N} \sum\_{n=1}^N z^n\_1\\
\nabla\_{V} \mathcal{L} = \sum\_{n=1}^N (V^{-1})^{\top} + (z^n\_1-\Pi)(z^n\_1-\Pi)^{\top} = 0 \to V^\* = \frac{1}{N}\sum\_{n=1}^N  (z^n\_1-\Pi^*)(z^n\_1-\Pi^*)^{\top}  \\
\nabla\_{A} \mathcal{L} = \sum\_{n=1}^N \sum\_{t=2}^T -2Q^{-1}(z^n\_{t}-Az^n\_{t-1})(z^n\_{t-1})^{\top} = 0 \to A^\* = \left(\sum\_{n=1}^N \sum\_{t=2}^T z^n\_{t}(z^n\_{t-1})^{\top} \right)\left(\sum\_{n=1}^N \sum\_{t=2}^T z^n\_{t-1}(z^n\_{t-1})^{\top} \right)^{-1}\\
\nabla\_{Q} \mathcal{L} = \sum\_{n=1}^N \sum\_{t=2}^T (Q^{-1})^{\top} + (z^n\_{t}-Az^n\_{t-1})(z^n\_{t}-Az^n\_{t-1})^{\top} = 0 \to Q^\* = \frac{1}{N(T-1)} \sum\_{n=1}^N \sum\_{t=2}^T (z^n\_{t}-A^*z^n\_{t-1})(z^n\_{t}-A^*z^n\_{t-1})^{\top}\\
\nabla\_{C} \mathcal{L} = \sum\_{n=1}^N \sum\_{t=1}^T -2R^{-1}(x^n\_{t}-Cz^n\_{t})(z^n\_{t})^{\top} = 0 \to C^* =\left(\sum\_{n=1}^N \sum\_{t=1}^T x^n\_{t}(z^n\_{t})^{\top} \right)\left(\sum\_{n=1}^N \sum\_{t=1}^T z^n\_{t}(z^n\_{t})^{\top} \right)^{-1}\\
\nabla\_{R} \mathcal{L}  = \sum\_{n=1}^N \sum\_{t=1}^T (R^{-1})^{\top} + (x^n\_{t}-Cz^n\_{t})(x^n\_{t}-Cz^n\_{t})^{\top} = 0 \to R^* = \frac{1}{NT} \sum\_{n=1}^N \sum\_{t=1}^T (x^n\_{t}-C^\*z^n\_{t})(x^n\_{t}-C^\*z^n\_{t})^{\top}
$$

#### Testing the Model

In the testing phase, we aim to compute$$P(z\_t|x\_1,\dots,x\_t)$$. At each time step, we apply two sub-steps, a one-step prediction and then a measurement update.

* One step Prediction

  $$
  P(z\_t|x\_1,\dots,x\_{t-1}) = \int P(z\_t|z\_{t-1}) P(z\_{t-1}|x\_1,\dots,x\_{t-1}) dz\_{t-1}
  $$
* Measurement update

  $$
  P(z\_t|x\_1,\dots,x\_{t}) = \frac{P(x\_t|z\_t)P(z\_t|x\_1,\dots,x\_{t-1})}{P(x\_t|x\_{t-1})}
  $$

  We’ll use the following notation for mean and convenience:

  $$
  \mu\_t^t = E\[z\_t|x\_1,\dots,x\_{t}]\\
  \Sigma\_t^t = cov\[z\_t|x\_1,\dots,x\_{t}]
  $$

  **One step Prediction**

  We assume that we have the mean $$\mu\_{t-1}^{t-1}$$*and covariance*$$\Sigma\_{t-1}^{t-1}$$ from the previous iteration. We need to compute $$\mu\_{t}^{t-1}$$*and* $$\Sigma\_{t}^{t-1}$$. Because$$z\_t = Az\_{t-1} + \gamma, \gamma \in \mathcal{N}(0, Q)$$&#x20;

  $$
  \mu\_{t}^{t-1} = E\[z\_t|x\_1,\dots,x\_{t-1}] \\
  \=  E\[ Az\_{t-1} + \gamma|x\_1,\dots,x\_{t-1}]\\
  \=  AE\[ z\_{t-1}|x\_1,\dots,x\_{t-1}] + E\[\gamma|x\_1,\dots,x\_{t-1}]\\
  \=  A\mu\_{t-1}^{t-1}
  $$

$$
\Sigma\_{t}^{t-1} = cov\[z\_t|x\_1,\dots,x\_{t-1}] \\
\=  cov\[ Az\_{t-1} + \gamma|x\_1,\dots,x\_{t-1}]\\
\=  E\[(Az\_{t-1} + \gamma - \mu\_{t}^{t-1})(Az\_{t-1} + \gamma-\mu\_{t}^{t-1})^{\top}|x\_1,\dots,x\_{t-1}] \\
\=  E\[(Az\_{t-1} - \mu\_{t}^{t-1})(Az\_{t-1} -\mu\_{t}^{t-1})^{\top}|x\_1,\dots,x\_{t-1}] + E\[\gamma \gamma^{\top}|x\_1,\dots,x\_{t-1}]\\
\=  E\[(Az\_{t-1} - A\mu\_{t-1}^{t-1})(Az\_{t-1} -A\mu\_{t-1}^{t-1})^{\top}|x\_1,\dots,x\_{t-1}] + Q\\
\= A E\[(z\_{t-1} - \mu\_{t-1}^{t-1})(z\_{t-1} -\mu\_{t-1}^{t-1})^{\top}|x\_1,\dots,x\_{t-1}] A^\top + Q\\
\= A \Sigma\_{t-1}^{t-1} A^\top + Q
$$

**Measurement update**

We take advantage of the property of the Gaussian distributions If$$x=\begin{bmatrix}x\_a\ x\_b\end{bmatrix}\sim \mathcal{N}\left(\begin{bmatrix} \mu\_a \ \mu\_b\end{bmatrix}, \begin{bmatrix}\Sigma\_{aa}&\Sigma\_{ab}\ \Sigma\_{ba}&\Sigma\_{bb}\end{bmatrix}\right)$$, then $$P(x\_a|x\_b)$$ is Gaussian with

$$
E(x\_a|x\_b) = \mu\_a + \Sigma\_{ab}\Sigma\_{bb}^{-1}(x\_b-\mu\_b)\\
cov(x\_a|x\_b) = \Sigma\_{aa} + \Sigma\_{ab}\Sigma\_{bb}^{-1}\Sigma\_{ba}
$$

Because$$x\_t = Cz\_{t} + \sigma, \sigma \in \mathcal{N}(0, R)$$, Then$$\mathbb{E}\[x\_t|x\_1,\dots,x\_{t-1}]=E\[Cz\_{t} + \sigma|x\_1,\dots,x\_{t-1}]=C \mu\_{t}^{t-1}$$*. Similarly* $$cov\[x\_t|x\_1,\dots,x\_{t-1}]=C \Sigma\_{t}^{t-1}C^{\top} + R$$&#x20;

$$
cov\[z\_t|x\_1,\dots,x\_{t-1}] = C\Sigma\_{t}^{t-1}
$$

Now we have the joint distribution $$P(\begin{bmatrix}x\_t\ z\_t\end{bmatrix}|x\_1,\dots,x\_{t-1})\sim \mathcal{N}\left(\begin{bmatrix}C \mu\_t^{t-1}\mu\_t^{t-1}\end{bmatrix}, \begin{bmatrix}C \Sigma\_{t}^{t-1}C^{\top} + R \&C\Sigma\_{t}^{t-1}\ \Sigma{t}^{t-1}C^{\top}&\Sigma\_{t}^{t-1}\end{bmatrix}\right)$$&#x20;

We can write the measurement updates with the Kalman gain

$$
\mu\_t^t = \mu\_{t}^{t-1} + K\_t (x\_t-C \mu\_t^{t-1})\\
\Sigma\_t^t = \Sigma\_{t}^{t-1} - K\_t C \Sigma\_{t}^{t-1}
$$

where $$K\_t = \Sigma\_t^{t-1}C^{\top}(C\Sigma\_{t}^{t-1}C^{\top}+R)^{-1}$$&#x20;

### Reference

* Matrix cookbook <https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf>
* Maximum likelihood for Multivariant Gaussian <https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/other-readings/chapter13.pdf>
