M
M
ML theory
Search…
Kalman Filter

## Model of Kalman Filter

We assume the current state can be modeled as a Gaussian distribution
$P(z_t|z_{t-1}) \sim \mathcal{N}(Az_{t-1},Q)$
We assume neural observations can also be modeled a Gaussian distribution
$P(x_t|z_{t}) \sim \mathcal{N}(Cz_{t},R)$
We also assume abase case
$P(z_{1}) \sim \mathcal{N}(\Pi,V)$
Thus the model parameters are:
$\Theta=\{A,Q,\Pi,V,C,R\}$

## Model Training

We aim to maximize the joint likelihood of the state and observed date
$\mathcal{D}=\{\{x\}^n, \{z\}^n\}_{n=1}^N=\{\{x_1^n, \dots, x_T^n\}, \{z_1^n, \dots, z_T^n\}\}_{n=1}^N$
$\Theta^* = \arg\max_{\Theta} P(\{x\}^n, \{z\}^n|\Theta)\\ = \arg\max_{\Theta} \prod_{n=1}^N P(z^n_1)\left( \prod_{t=2}^T P(z^n_t|z^n_{t-1}) \right)\left( \prod_{t=1}^T P(x^n_t|z^n_{t}) \right)\\ = \arg\max_{\Theta}\sum_{n=1}^N \log P(z^n_1) + \sum_{t=2}^T \log P(z^n_t|z^n_{t-1}) + \prod_{t=1}^T \log P(x^n_t|z^n_{t}) \\ = \arg\max_{\Theta} \sum_{n=1}^N -\frac{1}{2} \log|V|-\frac{1}{2}(z^n_1-\Pi)^{\top}V^{-1}(z^n_1-\Pi) + \sum_{t=2}^T \left(-\frac{1}{2} \log|Q|-\frac{1}{2}(z^n_{t}-Az^n_{t-1})^{\top}Q^{-1}(z^n_{t}-Az^n_{t-1})\right) + \sum_{t=1}^T \left(-\frac{1}{2} \log|R|-\frac{1}{2}(x^n_{t}-Cz^n_{t})^{\top}R^{-1}(x^n_{t}-Cz^n_{t})\right) \\ = \arg\min_{\Theta} \sum_{n=1}^N \log|V|+(z^n_1-\Pi)^{\top}V^{-1}(z^n_1-\Pi) + \sum_{t=2}^T \left( \log|Q|+(z^n_{t}-Az^n_{t-1})^{\top}Q^{-1}(z^n_{t}-Az^n_{t-1})\right) + \sum_{t=1}^T \left(\log|R|+\frac{1}{2}(x^n_{t}-Cz^n_{t})^{\top}R^{-1}(x^n_{t}-Cz^n_{t})\right)$
Suppose
$\mathcal{L}=\sum_{n=1}^N \log|V|+(z^n_1-\Pi)^{\top}V^{-1}(z^n_1-\Pi) + \sum_{t=2}^T \left( \log|Q|+(z^n_{t}-Az^n_{t-1})^{\top}Q^{-1}(z^n_{t}-Az^n_{t-1})\right) + \sum_{t=1}^T \left(\log|R|+\frac{1}{2}(x^n_{t}-Cz^n_{t})^{\top}R^{-1}(x^n_{t}-Cz^n_{t})\right)$
the minimize is achieved when the derivative vanishes
$\nabla_{\Pi} \mathcal{L} = \sum_{n=1}^N -2 (z^n_1-\Pi)^{\top}V^{-1} = 0 \to \Pi^* = \frac{1}{N} \sum_{n=1}^N z^n_1\\ \nabla_{V} \mathcal{L} = \sum_{n=1}^N (V^{-1})^{\top} + (z^n_1-\Pi)(z^n_1-\Pi)^{\top} = 0 \to V^* = \frac{1}{N}\sum_{n=1}^N (z^n_1-\Pi^*)(z^n_1-\Pi^*)^{\top} \\ \nabla_{A} \mathcal{L} = \sum_{n=1}^N \sum_{t=2}^T -2Q^{-1}(z^n_{t}-Az^n_{t-1})(z^n_{t-1})^{\top} = 0 \to A^* = \left(\sum_{n=1}^N \sum_{t=2}^T z^n_{t}(z^n_{t-1})^{\top} \right)\left(\sum_{n=1}^N \sum_{t=2}^T z^n_{t-1}(z^n_{t-1})^{\top} \right)^{-1}\\ \nabla_{Q} \mathcal{L} = \sum_{n=1}^N \sum_{t=2}^T (Q^{-1})^{\top} + (z^n_{t}-Az^n_{t-1})(z^n_{t}-Az^n_{t-1})^{\top} = 0 \to Q^* = \frac{1}{N(T-1)} \sum_{n=1}^N \sum_{t=2}^T (z^n_{t}-A^*z^n_{t-1})(z^n_{t}-A^*z^n_{t-1})^{\top}\\ \nabla_{C} \mathcal{L} = \sum_{n=1}^N \sum_{t=1}^T -2R^{-1}(x^n_{t}-Cz^n_{t})(z^n_{t})^{\top} = 0 \to C^* =\left(\sum_{n=1}^N \sum_{t=1}^T x^n_{t}(z^n_{t})^{\top} \right)\left(\sum_{n=1}^N \sum_{t=1}^T z^n_{t}(z^n_{t})^{\top} \right)^{-1}\\ \nabla_{R} \mathcal{L} = \sum_{n=1}^N \sum_{t=1}^T (R^{-1})^{\top} + (x^n_{t}-Cz^n_{t})(x^n_{t}-Cz^n_{t})^{\top} = 0 \to R^* = \frac{1}{NT} \sum_{n=1}^N \sum_{t=1}^T (x^n_{t}-C^*z^n_{t})(x^n_{t}-C^*z^n_{t})^{\top}$

### Testing the Model

In the testing phase, we aim to compute
$P(z_t|x_1,\dots,x_t)$
. At each time step, we apply two sub-steps, a one-step prediction and then a measurement update.
• One step Prediction
$P(z_t|x_1,\dots,x_{t-1}) = \int P(z_t|z_{t-1}) P(z_{t-1}|x_1,\dots,x_{t-1}) dz_{t-1}$
• Measurement update
$P(z_t|x_1,\dots,x_{t}) = \frac{P(x_t|z_t)P(z_t|x_1,\dots,x_{t-1})}{P(x_t|x_{t-1})}$
We’ll use the following notation for mean and convenience:
$\mu_t^t = E[z_t|x_1,\dots,x_{t}]\\ \Sigma_t^t = cov[z_t|x_1,\dots,x_{t}]$
One step Prediction
We assume that we have the mean
$\mu_{t-1}^{t-1}$
and covariance
$\Sigma_{t-1}^{t-1}$
from the previous iteration. We need to compute
$\mu_{t}^{t-1}$
and
$\Sigma_{t}^{t-1}$
. Because
$z_t = Az_{t-1} + \gamma, \gamma \in \mathcal{N}(0, Q)$
$\mu_{t}^{t-1} = E[z_t|x_1,\dots,x_{t-1}] \\ = E[ Az_{t-1} + \gamma|x_1,\dots,x_{t-1}]\\ = AE[ z_{t-1}|x_1,\dots,x_{t-1}] + E[\gamma|x_1,\dots,x_{t-1}]\\ = A\mu_{t-1}^{t-1}$
$\Sigma_{t}^{t-1} = cov[z_t|x_1,\dots,x_{t-1}] \\ = cov[ Az_{t-1} + \gamma|x_1,\dots,x_{t-1}]\\ = E[(Az_{t-1} + \gamma - \mu_{t}^{t-1})(Az_{t-1} + \gamma-\mu_{t}^{t-1})^{\top}|x_1,\dots,x_{t-1}] \\ = E[(Az_{t-1} - \mu_{t}^{t-1})(Az_{t-1} -\mu_{t}^{t-1})^{\top}|x_1,\dots,x_{t-1}] + E[\gamma \gamma^{\top}|x_1,\dots,x_{t-1}]\\ = E[(Az_{t-1} - A\mu_{t-1}^{t-1})(Az_{t-1} -A\mu_{t-1}^{t-1})^{\top}|x_1,\dots,x_{t-1}] + Q\\ = A E[(z_{t-1} - \mu_{t-1}^{t-1})(z_{t-1} -\mu_{t-1}^{t-1})^{\top}|x_1,\dots,x_{t-1}] A^\top + Q\\ = A \Sigma_{t-1}^{t-1} A^\top + Q$
Measurement update
We take advantage of the property of the Gaussian distributions If
$x=\begin{bmatrix}x_a\ x_b\end{bmatrix}\sim \mathcal{N}\left(\begin{bmatrix} \mu_a \\ \mu_b\end{bmatrix}, \begin{bmatrix}\Sigma_{aa}&\Sigma_{ab}\\ \Sigma_{ba}&\Sigma_{bb}\end{bmatrix}\right)$
, then
$P(x_a|x_b)$
is Gaussian with
$E(x_a|x_b) = \mu_a + \Sigma_{ab}\Sigma_{bb}^{-1}(x_b-\mu_b)\\ cov(x_a|x_b) = \Sigma_{aa} + \Sigma_{ab}\Sigma_{bb}^{-1}\Sigma_{ba}$
Because
$x_t = Cz_{t} + \sigma, \sigma \in \mathcal{N}(0, R)$
, Then
$\mathbb{E}[x_t|x_1,\dots,x_{t-1}]=E[Cz_{t} + \sigma|x_1,\dots,x_{t-1}]=C \mu_{t}^{t-1}$
. Similarly
$cov[x_t|x_1,\dots,x_{t-1}]=C \Sigma_{t}^{t-1}C^{\top} + R$
$cov[z_t|x_1,\dots,x_{t-1}] = C\Sigma_{t}^{t-1}$
Now we have the joint distribution
$P(\begin{bmatrix}x_t\ z_t\end{bmatrix}|x_1,\dots,x_{t-1})\sim \mathcal{N}\left(\begin{bmatrix}C \mu_t^{t-1}\\mu_t^{t-1}\end{bmatrix}, \begin{bmatrix}C \Sigma_{t}^{t-1}C^{\top} + R &C\Sigma_{t}^{t-1}\ \Sigma{t}^{t-1}C^{\top}&\Sigma_{t}^{t-1}\end{bmatrix}\right)$
We can write the measurement updates with the Kalman gain
$\mu_t^t = \mu_{t}^{t-1} + K_t (x_t-C \mu_t^{t-1})\\ \Sigma_t^t = \Sigma_{t}^{t-1} - K_t C \Sigma_{t}^{t-1}$
where
$K_t = \Sigma_t^{t-1}C^{\top}(C\Sigma_{t}^{t-1}C^{\top}+R)^{-1}$