Model of Kalman Filter
We assume the current state can be modeled as a Gaussian distribution
P(zt∣zt−1)∼N(Azt−1,Q) We assume neural observations can also be modeled a Gaussian distribution
P(xt∣zt)∼N(Czt,R) We also assume abase case
P(z1)∼N(Π,V) Thus the model parameters are:
Θ={A,Q,Π,V,C,R} Model Training
We aim to maximize the joint likelihood of the state and observed date
D={{x}n,{z}n}n=1N={{x1n,…,xTn},{z1n,…,zTn}}n=1N Θ∗=argΘmaxP({x}n,{z}n∣Θ)=argΘmaxn=1∏NP(z1n)(t=2∏TP(ztn∣zt−1n))(t=1∏TP(xtn∣ztn))=argΘmaxn=1∑NlogP(z1n)+t=2∑TlogP(ztn∣zt−1n)+t=1∏TlogP(xtn∣ztn)=argΘmaxn=1∑N−21log∣V∣−21(z1n−Π)⊤V−1(z1n−Π)+t=2∑T(−21log∣Q∣−21(ztn−Azt−1n)⊤Q−1(ztn−Azt−1n))+t=1∑T(−21log∣R∣−21(xtn−Cztn)⊤R−1(xtn−Cztn))=argΘminn=1∑Nlog∣V∣+(z1n−Π)⊤V−1(z1n−Π)+t=2∑T(log∣Q∣+(ztn−Azt−1n)⊤Q−1(ztn−Azt−1n))+t=1∑T(log∣R∣+21(xtn−Cztn)⊤R−1(xtn−Cztn)) Suppose
L=n=1∑Nlog∣V∣+(z1n−Π)⊤V−1(z1n−Π)+t=2∑T(log∣Q∣+(ztn−Azt−1n)⊤Q−1(ztn−Azt−1n))+t=1∑T(log∣R∣+21(xtn−Cztn)⊤R−1(xtn−Cztn)) the minimize is achieved when the derivative vanishes
∇ΠL=n=1∑N−2(z1n−Π)⊤V−1=0→Π∗=N1n=1∑Nz1n∇VL=n=1∑N(V−1)⊤+(z1n−Π)(z1n−Π)⊤=0→V∗=N1n=1∑N(z1n−Π∗)(z1n−Π∗)⊤∇AL=n=1∑Nt=2∑T−2Q−1(ztn−Azt−1n)(zt−1n)⊤=0→A∗=(n=1∑Nt=2∑Tztn(zt−1n)⊤)(n=1∑Nt=2∑Tzt−1n(zt−1n)⊤)−1∇QL=n=1∑Nt=2∑T(Q−1)⊤+(ztn−Azt−1n)(ztn−Azt−1n)⊤=0→Q∗=N(T−1)1n=1∑Nt=2∑T(ztn−A∗zt−1n)(ztn−A∗zt−1n)⊤∇CL=n=1∑Nt=1∑T−2R−1(xtn−Cztn)(ztn)⊤=0→C∗=(n=1∑Nt=1∑Txtn(ztn)⊤)(n=1∑Nt=1∑Tztn(ztn)⊤)−1∇RL=n=1∑Nt=1∑T(R−1)⊤+(xtn−Cztn)(xtn−Cztn)⊤=0→R∗=NT1n=1∑Nt=1∑T(xtn−C∗ztn)(xtn−C∗ztn)⊤ Testing the Model
In the testing phase, we aim to computeP(zt∣x1,…,xt). At each time step, we apply two sub-steps, a one-step prediction and then a measurement update.
One step Prediction
P(zt∣x1,…,xt−1)=∫P(zt∣zt−1)P(zt−1∣x1,…,xt−1)dzt−1 Measurement update
P(zt∣x1,…,xt)=P(xt∣xt−1)P(xt∣zt)P(zt∣x1,…,xt−1) We’ll use the following notation for mean and convenience:
μtt=E[zt∣x1,…,xt]Σtt=cov[zt∣x1,…,xt] One step Prediction
We assume that we have the mean μt−1t−1and covarianceΣt−1t−1 from the previous iteration. We need to compute μtt−1and Σtt−1. Becausezt=Azt−1+γ,γ∈N(0,Q)
μtt−1=E[zt∣x1,…,xt−1]=E[Azt−1+γ∣x1,…,xt−1]=AE[zt−1∣x1,…,xt−1]+E[γ∣x1,…,xt−1]=Aμt−1t−1
Σtt−1=cov[zt∣x1,…,xt−1]=cov[Azt−1+γ∣x1,…,xt−1]=E[(Azt−1+γ−μtt−1)(Azt−1+γ−μtt−1)⊤∣x1,…,xt−1]=E[(Azt−1−μtt−1)(Azt−1−μtt−1)⊤∣x1,…,xt−1]+E[γγ⊤∣x1,…,xt−1]=E[(Azt−1−Aμt−1t−1)(Azt−1−Aμt−1t−1)⊤∣x1,…,xt−1]+Q=AE[(zt−1−μt−1t−1)(zt−1−μt−1t−1)⊤∣x1,…,xt−1]A⊤+Q=AΣt−1t−1A⊤+Q Measurement update
We take advantage of the property of the Gaussian distributions Ifx=[xa xb]∼N([μaμb],[ΣaaΣbaΣabΣbb]), then P(xa∣xb) is Gaussian with
E(xa∣xb)=μa+ΣabΣbb−1(xb−μb)cov(xa∣xb)=Σaa+ΣabΣbb−1Σba Becausext=Czt+σ,σ∈N(0,R), ThenE[xt∣x1,…,xt−1]=E[Czt+σ∣x1,…,xt−1]=Cμtt−1. Similarly cov[xt∣x1,…,xt−1]=CΣtt−1C⊤+R
cov[zt∣x1,…,xt−1]=CΣtt−1 Now we have the joint distribution P([xt zt]∣x1,…,xt−1)∼N([Cμtt−1mutt−1],[CΣtt−1C⊤+RCΣtt−1 Σtt−1C⊤Σtt−1])
We can write the measurement updates with the Kalman gain
μtt=μtt−1+Kt(xt−Cμtt−1)Σtt=Σtt−1−KtCΣtt−1 where Kt=Σtt−1C⊤(CΣtt−1C⊤+R)−1
Reference