Implementing nodes

The main goal of BayesPy is to provide a package which enables easy and flexible construction of simple and complex models with efficient inference. However, users may sometimes be unable to construct their models because the built-in nodes do not implement some specific features. Thus, one may need to implement new nodes in order to construct the model. BayesPy aims to make the implementation of new nodes both simple and fast. Probably, a large complex model can be constructed almost completely with the built-in nodes and the user needs to implement only a few nodes.


In order to implement nodes, it is important to understand the messaging framework of the nodes. A node is a unit of calculation which communicates to its parent and child nodes using messages. These messages have types that need to match between nodes, that is, the child node needs to understand the messages its parents are sending and vice versa. Thus, a node defines which message type it requires from each of its parents, and only nodes that have that type of output message (i.e., the message to a child node) are valid parent nodes for that node.

The message type is defined by the moments of the parent node. The moments are a collection of expectations: \{ \langle f_1(X) \rangle, \ldots, \langle
f_N(X) \rangle \}. The functions f_1, \ldots, f_N (and the number of the functions) define the message type and they are the sufficient statistic as discussed in the previous section. Different message types are represented by Moments class hierarchy. For instance, GaussianMoments represents a message type with parent moments \{\langle \mathbf{x}
\rangle, \langle \mathbf{xx}^T \rangle \} and WishartMoments a message type with parent moments \{\langle \mathbf{\Lambda} \rangle, \langle \log
|\mathbf{\Lambda}| \rangle\}.

Let us give an example: Gaussian node outputs GaussianMoments messages and Wishart node outputs WishartMoments messages. Gaussian node requires that it receives GaussianMoments messages from the mean parent node and WishartMoments messages from the precision parent node. Thus, Gaussian and Wishart are valid node classes as the mean and precision parent nodes of Gaussian node.

Note that several nodes may have the same output message type and some message types can be transformed to other message types using deterministic converter nodes. For instance, Gaussian and GaussianARD nodes both output GaussianMoments messages, deterministic SumMultiply also outputs GaussianMoments messages, and deterministic converter _MarkovChainToGaussian converts GaussianMarkovChainMoments to GaussianMoments.

Each node specifies the message type requirements of its parents by Node._parent_moments attribute which is a list of Moments sub-class instances. These moments objects have a few purpose when creating the node: 1) check that parents are sending proper messages; 2) if parents use different message type, try to add a converter which converts the messages to the correct type if possible; 3) if given parents are not nodes but numeric arrays, convert them to constant nodes with correct output message type.

When implementing a new node, it is not always necessary to implement a new moments class. If another node has the same sufficient statistic vector, thus the same moments, that moments class can be used. Otherwise, one must implement a simple moments class which has the following methods:

  • Moments.compute_fixed_moments()

    Computes the moments for a known value. This is used to compute the moments of constant numeric arrays and wrap them into constant nodes.

  • Moments.compute_dims_from_values()

    Given a known value of the variable, return the shape of the variable dimensions in the moments. This is used to solve the shape of the moments array for constant nodes.


In order to implement a stochastic exponential family node, one must first write down the log probability density function of the node and derive the terms discussed in section Terms. These terms are implemented and collected as a class which is a subclass of Distribution. The main reason to implement these methods in another class instead of the node class itself is that these methods can be used without creating a node, for instance, in Mixture class.

For exponential family distributions, the distribution class is a subclass of ExponentialFamilyDistribution, and the relation between the terms in section Terms and the methods is as follows:

For each stochastic exponential family node, one must write a distribution class which implements these methods. After that, the node class is basically a simple wrapper and it also stores the moments and the natural parameters of the current posterior approximation. Note that the distribution classes do not store node-specific information, they are more like static collections of methods. However, sometimes the implementations depend on some information, such as the dimensionality of the variable, and this information must be provided, if needed, when constructing the distribution object.

In addition to the methods listed above, it is necessary to implement a few more methods in some cases. This happens when the plates of the parent do not map to the plates directly as discussed in section Irregular plates. Then, one must write methods that implement this plate mapping and apply the same mapping to the mask array:

It is important to understand when one must implement these methods, because the default implementations in the base class will lead to errors or weird results.

Stochastic exponential family nodes

After implementing the distribution class, the next task is to implement the node class. First, we need to explain a few important attributes before we can explain how to implement a node class.

Stochastic exponential family nodes have two attributes that store the state of the posterior distribution:

  • phi

    The natural parameter vector \tilde{\boldsymbol{\phi}} of the posterior approximation.

  • u

    The moments \langle \mathbf{u} \rangle of the posterior approximation.

Instead of storing these two variables as vectors (as in the mathematical formulas), they are stored as lists of arrays with convenient shapes. For instance, Gaussian node stores the moments as a list consisting of a vector \langle \mathbf{x} \rangle and a matrix \langle
\mathbf{xx}^T \rangle instead of reshaping and concatenating these into a single vector. The same applies for the natural parameters phi because it has the same shape as u.

The shapes of the arrays in the lists u and phi consist of the shape caused by the plates and the shape caused by the variable itself. For instance, the moments of Gaussian node have shape (D,) and (D, D), where D is the dimensionality of the Gaussian vector. In addition, if the node has plates, they are added to these shapes. Thus, for instance, if the Gaussian node has plates (3, 7) and D is 5, the shape of u[0] and phi[0] would be (3, 7, 5) and the shape of u[1] and phi[1] would be (3, 7, 5, 5). This shape information is stored in the following attributes:

  • plates : a tuple

    The plates of the node. In our example, (3, 7).

  • dims : a list of tuples

    The shape of each of the moments arrays (or natural parameter arrays) without plates. In our example, [ (5,), (5, 5) ].

Finally, three attributes define VMP for the node:

  • _moments : Moments sub-class instance

    An object defining the moments of the node.

  • _parent_moments : list of Moments sub-class instances

    A list defining the moments requirements for each parent.

  • _distribution : Distribution sub-class instance

    An object implementing the VMP formulas.

Basically, a node class is a collection of the above attributes. When a node is created, these attributes are defined. The base class for exponential family nodes, ExponentialFamily, provides a simple default constructor which does not need to be overwritten if dims, _moments, _parent_moments and _distribution can be provided as static class attributes. For instance, Gamma node defines these attributes statically. However, usually at least one of these attributes cannot be defined statically in the class. In that case, one must implement a class method which overloads ExponentialFamily._constructor(). The purpose of this method is to define all the attributes given the parent nodes. These are defined using a class method instead of __init__ method in order to be able to use the class constructors statically, for instance, in Mixture class. This construction allows users to create mixtures of any exponential family distribution with simple syntax.

The parents of a node must be converted so that they have a correct message type, because the user may have provided numeric arrays or nodes with incorrect message type. Numeric arrays should be converted to constant nodes with correct message type. Incorrect message type nodes should be converted to correct message type nodes if possible. Thus, the constructor should use Node._ensure_moments method to make sure the parent is a node with correct message type. Instead of calling this method for each parent node in the constructor, one can use ensureparents decorator to do this automatically. However, the decorator requires that _parent_moments attribute has already been defined statically. If this is not possible, the parent nodes must be converted manually in the constructor, because one should never assume that the parent nodes given to the constructor are nodes with correct message type or even nodes at all.

Deterministic nodes

Deterministic nodes are nodes that do not correspond to any probability distribution but rather a deterministic function. It does not have any moments or natural parameters to store. A deterministic node is implemented as a subclass of Deterministic base class. The new node class must implement the following methods:

  • Deterministic._compute_moments()

    Computes the moments given the moments of the parents.

  • Deterministic._compute_message_to_parent()

    Computes the message to a parent node given the message from the children and the moments of the other parents. In some cases, one may want to implement Deterministic._compute_message_and_mask_to_parent() or Deterministic._message_to_parent() instead in order to gain more control over efficient computation.

Similarly as in Distribution class, if the node handles plates irregularly, it is important to implement the following methods:

  • Deterministic._plates_from_parent()

    Given the plates of the parent, return the resulting plates of the child.

  • Deterministic._plates_to_parent()

    Given the plates of the child, return the plates of the parent that would have resulted them.

  • Deterministic._compute_weights_to_parent()

    Given the mask array, convert it to a plate mask of the parent.

Converter nodes

Sometimes a node has incorrect message type but the message can be converted into a correct type. For instance, GaussianMarkovChain has GaussianMarkovChainMoments message type, which means moments \{
\langle \mathbf{x}_n \rangle, \langle \mathbf{x}_n \mathbf{x}_n^T \rangle,
\langle \mathbf{x}_n \mathbf{x}_{n-1}^T \rangle \}^N_{n=1}. These moments can be converted to GaussianMoments by ignoring the third element and considering the time axis as a plate axis. Thus, if a node requires GaussianMoments message from its parent, GaussianMarkovChain is a valid parent if its messages are modified properly. This conversion is implemented in _MarkovChainToGaussian converter class. Converter nodes are simple deterministic nodes that have one parent node and they convert the messages to another message type.

For the user, it is not convenient if the exact message type has to be known and an explicit converter node needs to be created. Thus, the conversions are done automatically and the user will be unaware of them. In order to enable this automatization, when writing a converter node, one should register the converter to the moments class using Moments.add_converter(). For instance, a class X which converts moments A to moments B is registered as A.add_conveter(B, X). After that, Node._ensure_moments() and Node._convert() methods are used to perform the conversions automatically. The conversion can consist of several consecutive converter nodes, and the least number of conversions is used.