bayespy.inference.vmp.nodes.gaussian_markov_chain.GaussianMarkovChainDistribution

class bayespy.inference.vmp.nodes.gaussian_markov_chain.GaussianMarkovChainDistribution(N, D)[source]

Implementation of VMP formulas for Gaussian Markov chain

The log probability density function of the prior:

Todo

Fix inputs and their weight matrix in the equations.

\log p(\mathbf{X} | \boldsymbol{\mu}, \mathbf{\Lambda},
\mathbf{A}, \mathbf{B}, \boldsymbol{\nu})
=& \log \mathcal{N}(\mathbf{x}_0|\boldsymbol{\mu}, \mathbf{\Lambda})
+ \sum^N_{n=1} \log \mathcal{N}(
  \mathbf{x}_n | \mathbf{Ax}_{n-1} + \mathbf{Bu}_n,
                 \mathrm{diag}(\boldsymbol{\nu}))
\\
=&
- \frac{1}{2} \mathbf{x}_0^T \mathbf{\Lambda} \mathbf{x}_0
+ \frac{1}{2} \mathbf{x}_0^T \mathbf{\Lambda} \boldsymbol{\mu}
+ \frac{1}{2} \boldsymbol{\mu}^T \mathbf{\Lambda} \mathbf{x}_0
- \frac{1}{2} \boldsymbol{\mu}^T \mathbf{\Lambda} \boldsymbol{\mu}
+ \frac{1}{2} \log|\mathbf{\Lambda}|
\\
&
- \frac{1}{2} \sum^N_{n=1} \mathbf{x}_n^T \mathrm{diag}(\boldsymbol{\nu}) \mathbf{x}_n
+ \frac{1}{2} \sum^N_{n=1} \mathbf{x}_n^T \mathrm{diag}(\boldsymbol{\nu}) \mathbf{A} \mathbf{x}_{n-1}
+ \frac{1}{2} \sum^N_{n=1} \mathbf{x}_{n-1}^T\mathbf{A}^T \mathrm{diag}(\boldsymbol{\nu}) \mathbf{x}_n
- \frac{1}{2} \sum^N_{n=1} \mathbf{x}_{n-1}^T\mathbf{A}^T \mathrm{diag}(\boldsymbol{\nu}) \mathbf{A} \mathbf{x}_{n-1}
\\ &
+ \sum^N_{n=1} \sum^D_{d=1} \log\nu_d - \frac{1}{2} (N+1) D \log(2\pi)
\\
=&
\begin{bmatrix}
  \mathbf{x}_0 \\ \mathbf{x}_1 \\ \vdots \\ \mathbf{x}_{N-1} \\ \mathbf{x}_N
\end{bmatrix}^T
\begin{bmatrix}
  -\frac{1}{2}\mathbf{\Lambda} - \frac{1}{2}\mathbf{A}\mathrm{diag}(\boldsymbol{\nu})\mathbf{A}^T
  &
  \frac{1}{2} \mathbf{A}^T\mathrm{diag}(\boldsymbol{\nu})
  &
  &
  &
  \\
  \frac{1}{2} \mathrm{diag}(\boldsymbol{\nu}) \mathbf{A}
  &
  -\frac{1}{2} \mathrm{diag}(\boldsymbol{\nu})
  - \frac{1}{2}\mathbf{A}^T\mathrm{diag}(\boldsymbol{\nu})\mathbf{A}^T
  &
  \frac{1}{2} \mathbf{A}^T\mathrm{diag}(\boldsymbol{\nu})
  &
  &
  \\
  &
  \ddots
  &
  \ddots
  &
  \ddots
  &
  \\
  &
  &
  \frac{1}{2} \mathrm{diag}(\boldsymbol{\nu}) \mathbf{A}
  &
  -\frac{1}{2} \mathrm{diag}(\boldsymbol{\nu})
  - \frac{1}{2}\mathbf{A}^T\mathrm{diag}(\boldsymbol{\nu})\mathbf{A}^T
  &
  \frac{1}{2} \mathbf{A}^T\mathrm{diag}(\boldsymbol{\nu})
  \\
  &
  &
  &
  \frac{1}{2} \mathrm{diag}(\boldsymbol{\nu}) \mathbf{A}
  &
  -\frac{1}{2} \mathrm{diag}(\boldsymbol{\nu})
\end{bmatrix}
\begin{bmatrix}
  \mathbf{x}_0 \\ \mathbf{x}_1 \\ \vdots \\ \mathbf{x}_{N-1} \\ \mathbf{x}_N
\end{bmatrix}
\\
&
+ \frac{1}{2} \mathbf{x}_0^T \mathbf{\Lambda} \boldsymbol{\mu}
+ \frac{1}{2} \boldsymbol{\mu}^T \mathbf{\Lambda} \mathbf{x}_0
- \frac{1}{2} \boldsymbol{\mu}^T \mathbf{\Lambda} \boldsymbol{\mu}
+ \frac{1}{2} \log|\mathbf{\Lambda}|
+ \sum^N_{n=1} \sum^D_{d=1} \log\nu_d - \frac{1}{2} (N+1) D \log(2\pi)

For simplicity, \boldsymbol{\nu} and \mathbf{A} are assumed not to depend on n in the above equation, but this distribution class supports that dependency. One only needs to do the following replacements in the equations: \boldsymbol{\nu} \leftarrow \boldsymbol{\nu}_n and \mathbf{A} \leftarrow \mathbf{A}_n, where n=1,\ldots,N.

u(\mathbf{X}) &=
\begin{bmatrix}
  \begin{bmatrix} \mathbf{x}_0 & \ldots & \mathbf{x}_N \end{bmatrix}
  \\
  \begin{bmatrix} \mathbf{x}_0\mathbf{x}_0^T & \ldots & \mathbf{x}_N\mathbf{x}_N^T \end{bmatrix}
  \\
  \begin{bmatrix} \mathbf{x}_0\mathbf{x}_1^T & \ldots & \mathbf{x}_{N-1}\mathbf{x}_N^T \end{bmatrix}
\end{bmatrix}
\\
\phi(\boldsymbol{\mu}, \mathbf{\Lambda}, \mathbf{A}, \boldsymbol{\nu}) &=
\begin{bmatrix}
  \begin{bmatrix}
    \mathbf{\Lambda} \boldsymbol{\mu} & \mathbf{0} & \ldots & \mathbf{0}
  \end{bmatrix}
  \\
  \begin{bmatrix}
    -\frac{1}{2}\mathbf{\Lambda} - \frac{1}{2} \mathbf{A}\mathrm{diag}(\boldsymbol{\nu})\mathbf{A}^T &
    -\frac{1}{2}\mathrm{diag}(\boldsymbol{\nu}) - \frac{1}{2} \mathbf{A}\mathrm{diag}(\boldsymbol{\nu})\mathbf{A}^T &
    \ldots &
    -\frac{1}{2}\mathrm{diag}(\boldsymbol{\nu}) - \frac{1}{2} \mathbf{A}\mathrm{diag}(\boldsymbol{\nu})\mathbf{A}^T &
    -\frac{1}{2}\mathrm{diag}(\boldsymbol{\nu})
  \end{bmatrix}
  \\
  \begin{bmatrix}
    \mathbf{A}^T \mathrm{diag}(\boldsymbol{\nu}) & \ldots & \mathbf{A}^T \mathrm{diag}(\boldsymbol{\nu})
  \end{bmatrix}
\end{bmatrix}
\\
g(\boldsymbol{\mu}, \mathbf{\Lambda}, \mathbf{A}, \boldsymbol{\nu}) &=
\frac{1}{2}\log|\mathbf{\Lambda}| + \frac{1}{2} \sum^N_{n=1}\sum^D_{d=1}\log\nu_d
\\
f(\mathbf{X}) &= -\frac{1}{2} (N+1) D \log(2\pi)

The log probability denisty function of the posterior approximation:

\log q(\mathbf{X}) &=
\begin{bmatrix}
  \mathbf{x}_0
  \\
  \mathbf{x}_1
  \\
  \vdots
  \\
  \mathbf{x}_{N-1}
  \\
  \mathbf{x}_N
\end{bmatrix}^T
\begin{bmatrix}
  \mathbf{\Phi}_0^{(2)} & \frac{1}{2}\mathbf{\Phi}_1^{(3)} & & &
  \\
  \frac{1}{2}{\mathbf{\Phi}_1^{(3)}}^T & \mathbf{\Phi}_1^{(2)} & \frac{1}{2}\mathbf{\Phi}_2^{(3)} & &
  \\
  & \ddots & \ddots & \ddots &
  \\
  & & \frac{1}{2}{\mathbf{\Phi}_{N-1}^{(3)}}^T & \mathbf{\Phi}_{N-1}^{(2)} & \frac{1}{2}\mathbf{\Phi}_N^{(3)}
  \\
  & & & \frac{1}{2}{\mathbf{\Phi}_N^{(3)}}^T & \mathbf{\Phi}_N^{(2)}
\end{bmatrix}
\begin{bmatrix}
  \mathbf{x}_0
  \\
  \mathbf{x}_1
  \\
  \vdots
  \\
  \mathbf{x}_{N-1}
  \\
  \mathbf{x}_N
\end{bmatrix}
+ \ldots

__init__(N, D)

Methods

__init__(N, D)

compute_cgf_from_parents(u_mu_Lambda, ...)

Compute CGF using the moments of the parents.

compute_fixed_moments_and_f(x[, mask])

Compute u(x) and f(x) for given x.

compute_gradient(g, u, phi)

Compute the standard gradient with respect to the natural parameters.

compute_logpdf(u, phi, g, f, ndims)

Compute E[log p(X)] given E[u], E[phi], E[g] and E[f].

compute_message_to_parent(parent, index, u, ...)

Compute a message to a parent.

compute_moments_and_cgf(phi[, mask])

Compute the moments and the cumulant-generating function.

compute_phi_from_parents(u_mu_Lambda, ...[, ...])

Compute the natural parameters using parents' moments.

compute_rotation_bound(u, u_mu_Lambda, u_A_V, R)

compute_weights_to_parent(index, weights)

Maps the mask to the plates of a parent.

plates_from_parent(index, plates)

Compute the plates using information of a parent node.

plates_to_parent(index, plates)

Computes the plates of this node with respect to a parent.

random(*params[, plates])

Draw a random sample from the distribution.

rotate(u, phi, R[, inv, logdet])

squeeze(axis)

Squeeze a plate axis from the distribution