M
M
ML theory
Search…
Kalman Filter

Model of Kalman Filter

We assume the current state can be modeled as a Gaussian distribution
P(ztzt1)N(Azt1,Q)P(z_t|z_{t-1}) \sim \mathcal{N}(Az_{t-1},Q)
We assume neural observations can also be modeled a Gaussian distribution
P(xtzt)N(Czt,R)P(x_t|z_{t}) \sim \mathcal{N}(Cz_{t},R)
We also assume abase case
P(z1)N(Π,V)P(z_{1}) \sim \mathcal{N}(\Pi,V)
Thus the model parameters are:
Θ={A,Q,Π,V,C,R}\Theta=\{A,Q,\Pi,V,C,R\}

Model Training

We aim to maximize the joint likelihood of the state and observed date
D={{x}n,{z}n}n=1N={{x1n,,xTn},{z1n,,zTn}}n=1N\mathcal{D}=\{\{x\}^n, \{z\}^n\}_{n=1}^N=\{\{x_1^n, \dots, x_T^n\}, \{z_1^n, \dots, z_T^n\}\}_{n=1}^N
Θ=argmaxΘP({x}n,{z}nΘ)=argmaxΘn=1NP(z1n)(t=2TP(ztnzt1n))(t=1TP(xtnztn))=argmaxΘn=1NlogP(z1n)+t=2TlogP(ztnzt1n)+t=1TlogP(xtnztn)=argmaxΘn=1N12logV12(z1nΠ)V1(z1nΠ)+t=2T(12logQ12(ztnAzt1n)Q1(ztnAzt1n))+t=1T(12logR12(xtnCztn)R1(xtnCztn))=argminΘn=1NlogV+(z1nΠ)V1(z1nΠ)+t=2T(logQ+(ztnAzt1n)Q1(ztnAzt1n))+t=1T(logR+12(xtnCztn)R1(xtnCztn))\Theta^* = \arg\max_{\Theta} P(\{x\}^n, \{z\}^n|\Theta)\\ = \arg\max_{\Theta} \prod_{n=1}^N P(z^n_1)\left( \prod_{t=2}^T P(z^n_t|z^n_{t-1}) \right)\left( \prod_{t=1}^T P(x^n_t|z^n_{t}) \right)\\ = \arg\max_{\Theta}\sum_{n=1}^N \log P(z^n_1) + \sum_{t=2}^T \log P(z^n_t|z^n_{t-1}) + \prod_{t=1}^T \log P(x^n_t|z^n_{t}) \\ = \arg\max_{\Theta} \sum_{n=1}^N -\frac{1}{2} \log|V|-\frac{1}{2}(z^n_1-\Pi)^{\top}V^{-1}(z^n_1-\Pi) + \sum_{t=2}^T \left(-\frac{1}{2} \log|Q|-\frac{1}{2}(z^n_{t}-Az^n_{t-1})^{\top}Q^{-1}(z^n_{t}-Az^n_{t-1})\right) + \sum_{t=1}^T \left(-\frac{1}{2} \log|R|-\frac{1}{2}(x^n_{t}-Cz^n_{t})^{\top}R^{-1}(x^n_{t}-Cz^n_{t})\right) \\ = \arg\min_{\Theta} \sum_{n=1}^N \log|V|+(z^n_1-\Pi)^{\top}V^{-1}(z^n_1-\Pi) + \sum_{t=2}^T \left( \log|Q|+(z^n_{t}-Az^n_{t-1})^{\top}Q^{-1}(z^n_{t}-Az^n_{t-1})\right) + \sum_{t=1}^T \left(\log|R|+\frac{1}{2}(x^n_{t}-Cz^n_{t})^{\top}R^{-1}(x^n_{t}-Cz^n_{t})\right)
Suppose
L=n=1NlogV+(z1nΠ)V1(z1nΠ)+t=2T(logQ+(ztnAzt1n)Q1(ztnAzt1n))+t=1T(logR+12(xtnCztn)R1(xtnCztn))\mathcal{L}=\sum_{n=1}^N \log|V|+(z^n_1-\Pi)^{\top}V^{-1}(z^n_1-\Pi) + \sum_{t=2}^T \left( \log|Q|+(z^n_{t}-Az^n_{t-1})^{\top}Q^{-1}(z^n_{t}-Az^n_{t-1})\right) + \sum_{t=1}^T \left(\log|R|+\frac{1}{2}(x^n_{t}-Cz^n_{t})^{\top}R^{-1}(x^n_{t}-Cz^n_{t})\right)
the minimize is achieved when the derivative vanishes
ΠL=n=1N2(z1nΠ)V1=0Π=1Nn=1Nz1nVL=n=1N(V1)+(z1nΠ)(z1nΠ)=0V=1Nn=1N(z1nΠ)(z1nΠ)AL=n=1Nt=2T2Q1(ztnAzt1n)(zt1n)=0A=(n=1Nt=2Tztn(zt1n))(n=1Nt=2Tzt1n(zt1n))1QL=n=1Nt=2T(Q1)+(ztnAzt1n)(ztnAzt1n)=0Q=1N(T1)n=1Nt=2T(ztnAzt1n)(ztnAzt1n)CL=n=1Nt=1T2R1(xtnCztn)(ztn)=0C=(n=1Nt=1Txtn(ztn))(n=1Nt=1Tztn(ztn))1RL=n=1Nt=1T(R1)+(xtnCztn)(xtnCztn)=0R=1NTn=1Nt=1T(xtnCztn)(xtnCztn)\nabla_{\Pi} \mathcal{L} = \sum_{n=1}^N -2 (z^n_1-\Pi)^{\top}V^{-1} = 0 \to \Pi^* = \frac{1}{N} \sum_{n=1}^N z^n_1\\ \nabla_{V} \mathcal{L} = \sum_{n=1}^N (V^{-1})^{\top} + (z^n_1-\Pi)(z^n_1-\Pi)^{\top} = 0 \to V^* = \frac{1}{N}\sum_{n=1}^N (z^n_1-\Pi^*)(z^n_1-\Pi^*)^{\top} \\ \nabla_{A} \mathcal{L} = \sum_{n=1}^N \sum_{t=2}^T -2Q^{-1}(z^n_{t}-Az^n_{t-1})(z^n_{t-1})^{\top} = 0 \to A^* = \left(\sum_{n=1}^N \sum_{t=2}^T z^n_{t}(z^n_{t-1})^{\top} \right)\left(\sum_{n=1}^N \sum_{t=2}^T z^n_{t-1}(z^n_{t-1})^{\top} \right)^{-1}\\ \nabla_{Q} \mathcal{L} = \sum_{n=1}^N \sum_{t=2}^T (Q^{-1})^{\top} + (z^n_{t}-Az^n_{t-1})(z^n_{t}-Az^n_{t-1})^{\top} = 0 \to Q^* = \frac{1}{N(T-1)} \sum_{n=1}^N \sum_{t=2}^T (z^n_{t}-A^*z^n_{t-1})(z^n_{t}-A^*z^n_{t-1})^{\top}\\ \nabla_{C} \mathcal{L} = \sum_{n=1}^N \sum_{t=1}^T -2R^{-1}(x^n_{t}-Cz^n_{t})(z^n_{t})^{\top} = 0 \to C^* =\left(\sum_{n=1}^N \sum_{t=1}^T x^n_{t}(z^n_{t})^{\top} \right)\left(\sum_{n=1}^N \sum_{t=1}^T z^n_{t}(z^n_{t})^{\top} \right)^{-1}\\ \nabla_{R} \mathcal{L} = \sum_{n=1}^N \sum_{t=1}^T (R^{-1})^{\top} + (x^n_{t}-Cz^n_{t})(x^n_{t}-Cz^n_{t})^{\top} = 0 \to R^* = \frac{1}{NT} \sum_{n=1}^N \sum_{t=1}^T (x^n_{t}-C^*z^n_{t})(x^n_{t}-C^*z^n_{t})^{\top}

Testing the Model

In the testing phase, we aim to compute
P(ztx1,,xt)P(z_t|x_1,\dots,x_t)
. At each time step, we apply two sub-steps, a one-step prediction and then a measurement update.
  • One step Prediction
    P(ztx1,,xt1)=P(ztzt1)P(zt1x1,,xt1)dzt1P(z_t|x_1,\dots,x_{t-1}) = \int P(z_t|z_{t-1}) P(z_{t-1}|x_1,\dots,x_{t-1}) dz_{t-1}
  • Measurement update
    P(ztx1,,xt)=P(xtzt)P(ztx1,,xt1)P(xtxt1)P(z_t|x_1,\dots,x_{t}) = \frac{P(x_t|z_t)P(z_t|x_1,\dots,x_{t-1})}{P(x_t|x_{t-1})}
    We’ll use the following notation for mean and convenience:
    μtt=E[ztx1,,xt]Σtt=cov[ztx1,,xt]\mu_t^t = E[z_t|x_1,\dots,x_{t}]\\ \Sigma_t^t = cov[z_t|x_1,\dots,x_{t}]
    One step Prediction
    We assume that we have the mean
    μt1t1\mu_{t-1}^{t-1}
    and covariance
    Σt1t1\Sigma_{t-1}^{t-1}
    from the previous iteration. We need to compute
    μtt1\mu_{t}^{t-1}
    and
    Σtt1\Sigma_{t}^{t-1}
    . Because
    zt=Azt1+γ,γN(0,Q)z_t = Az_{t-1} + \gamma, \gamma \in \mathcal{N}(0, Q)
    μtt1=E[ztx1,,xt1]=E[Azt1+γx1,,xt1]=AE[zt1x1,,xt1]+E[γx1,,xt1]=Aμt1t1\mu_{t}^{t-1} = E[z_t|x_1,\dots,x_{t-1}] \\ = E[ Az_{t-1} + \gamma|x_1,\dots,x_{t-1}]\\ = AE[ z_{t-1}|x_1,\dots,x_{t-1}] + E[\gamma|x_1,\dots,x_{t-1}]\\ = A\mu_{t-1}^{t-1}
Σtt1=cov[ztx1,,xt1]=cov[Azt1+γx1,,xt1]=E[(Azt1+γμtt1)(Azt1+γμtt1)x1,,xt1]=E[(Azt1μtt1)(Azt1μtt1)x1,,xt1]+E[γγx1,,xt1]=E[(Azt1Aμt1t1)(Azt1Aμt1t1)x1,,xt1]+Q=AE[(zt1μt1t1)(zt1μt1t1)x1,,xt1]A+Q=AΣt1t1A+Q\Sigma_{t}^{t-1} = cov[z_t|x_1,\dots,x_{t-1}] \\ = cov[ Az_{t-1} + \gamma|x_1,\dots,x_{t-1}]\\ = E[(Az_{t-1} + \gamma - \mu_{t}^{t-1})(Az_{t-1} + \gamma-\mu_{t}^{t-1})^{\top}|x_1,\dots,x_{t-1}] \\ = E[(Az_{t-1} - \mu_{t}^{t-1})(Az_{t-1} -\mu_{t}^{t-1})^{\top}|x_1,\dots,x_{t-1}] + E[\gamma \gamma^{\top}|x_1,\dots,x_{t-1}]\\ = E[(Az_{t-1} - A\mu_{t-1}^{t-1})(Az_{t-1} -A\mu_{t-1}^{t-1})^{\top}|x_1,\dots,x_{t-1}] + Q\\ = A E[(z_{t-1} - \mu_{t-1}^{t-1})(z_{t-1} -\mu_{t-1}^{t-1})^{\top}|x_1,\dots,x_{t-1}] A^\top + Q\\ = A \Sigma_{t-1}^{t-1} A^\top + Q
Measurement update
We take advantage of the property of the Gaussian distributions If
x=[xa xb]N([μaμb],[ΣaaΣabΣbaΣbb])x=\begin{bmatrix}x_a\ x_b\end{bmatrix}\sim \mathcal{N}\left(\begin{bmatrix} \mu_a \\ \mu_b\end{bmatrix}, \begin{bmatrix}\Sigma_{aa}&\Sigma_{ab}\\ \Sigma_{ba}&\Sigma_{bb}\end{bmatrix}\right)
, then
P(xaxb)P(x_a|x_b)
is Gaussian with
E(xaxb)=μa+ΣabΣbb1(xbμb)cov(xaxb)=Σaa+ΣabΣbb1ΣbaE(x_a|x_b) = \mu_a + \Sigma_{ab}\Sigma_{bb}^{-1}(x_b-\mu_b)\\ cov(x_a|x_b) = \Sigma_{aa} + \Sigma_{ab}\Sigma_{bb}^{-1}\Sigma_{ba}
Because
xt=Czt+σ,σN(0,R)x_t = Cz_{t} + \sigma, \sigma \in \mathcal{N}(0, R)
, Then
E[xtx1,,xt1]=E[Czt+σx1,,xt1]=Cμtt1\mathbb{E}[x_t|x_1,\dots,x_{t-1}]=E[Cz_{t} + \sigma|x_1,\dots,x_{t-1}]=C \mu_{t}^{t-1}
. Similarly
cov[xtx1,,xt1]=CΣtt1C+Rcov[x_t|x_1,\dots,x_{t-1}]=C \Sigma_{t}^{t-1}C^{\top} + R
cov[ztx1,,xt1]=CΣtt1cov[z_t|x_1,\dots,x_{t-1}] = C\Sigma_{t}^{t-1}
Now we have the joint distribution
P([xt zt]x1,,xt1)N([Cμtt1mutt1],[CΣtt1C+RCΣtt1 Σtt1CΣtt1])P(\begin{bmatrix}x_t\ z_t\end{bmatrix}|x_1,\dots,x_{t-1})\sim \mathcal{N}\left(\begin{bmatrix}C \mu_t^{t-1}\\mu_t^{t-1}\end{bmatrix}, \begin{bmatrix}C \Sigma_{t}^{t-1}C^{\top} + R &C\Sigma_{t}^{t-1}\ \Sigma{t}^{t-1}C^{\top}&\Sigma_{t}^{t-1}\end{bmatrix}\right)
We can write the measurement updates with the Kalman gain
μtt=μtt1+Kt(xtCμtt1)Σtt=Σtt1KtCΣtt1\mu_t^t = \mu_{t}^{t-1} + K_t (x_t-C \mu_t^{t-1})\\ \Sigma_t^t = \Sigma_{t}^{t-1} - K_t C \Sigma_{t}^{t-1}
where
Kt=Σtt1C(CΣtt1C+R)1K_t = \Sigma_t^{t-1}C^{\top}(C\Sigma_{t}^{t-1}C^{\top}+R)^{-1}

Reference

Last modified 2mo ago