Introducing Paysage

Paysage is a new PyTorch-powered python library for machine learning with Restricted Boltzmann Machines. We built Paysage from scratch at Unlearn.AI in order to bring the power of GPU acceleration, recent developments in machine learning, and our own new ideas to bear on the training of this model class.

We are excited to release this toolkit to the community as an open-source software library. This first release is version 0.2.0. We will be pushing regular updates from here on out, with many new features in the pipeline. 😉

The next release (0.3.0) is scheduled for July 10, 2018.

Paysage is a powerful library for training RBMs — and more-generally, energy-based neural network models.

Paysage allows one to construct models

Paysage implements a number of training methods such as,

Paysage is built on top of two backends: numpy and PyTorch. The latter unlocks the power of the GPU for greatly optimized training and sampling.

We at Unlearn are excited to release this project and let the community take it to new heights.

What is an energy-based model?

Energy-based models (EBMs) are machine learning models which arise from the specification of an energy function E(X; θ). This function yields a scalar value for each configuration of some specified variables X, and depends on some model parameters, θ, which can be adjusted. Therefore the energy function forms a parameterized family of surfaces over the variables X. Training such a model involves searching through the parameters to find the energy surface which closely matches some desired configuration. A probabilistic perspective on such models is available by virtue of the Boltzmann distribution,

So an EBM also represents a probability distribution over X for each choice of θ. From this perspective, training can then be thought of as fitting the probability distribution p(X; θ) to some desired distribution — perhaps one which is measured by some given dataset.

Visible and hidden variables

It is often the case that only some subset of the configuration variables X are observed in some training set. In this case the observed variables are called visible and the rest hidden or latent variables of the model. So X splits into X = (V, H). In this case the family of probability distributions defined by the model which are relevant to training are the marginals over the hidden variables,

EBMs are natural tools in unsupervised learning

Suppose one observes a bunch of samples of variables V. These samples can be thought of as random draws from some hypothetical data distributionp_d(V). An EBM can be trained to approximate the hypothetical data distribution by fitting p(V; θ) to p_d(V). The resulting model allows one to,

Furthermore, the fact that an EBM describes an explicit, parametrized probability distribution is a benefit compared to some of the other kinds of generative models.

Energy-based neural networks

An energy-based neural network is an EBM in which the energy function E(v,h;θ) arises from the architecture of an artificial neural network. It is probably best to understand this by way of example. Consider a two-layer Gaussian-Bernoulli Restricted Boltzmann Machine (GBRBM). Such a model has, say, n visible real-valued units, and say m hidden binary-valued units. So a configuration of visible and hidden variables constitutes a pair of vectors (v,h)with an n-dimensional real vector v, and an m-dimensional binary vector h. The visible and hidden units are separated into two layers, the visible called a Gaussian layer and the hidden a Bernoulli layer.

The energy function for this model takes the form,

This energy function has a contribution from each layer (a quadratic potential from the Gaussian layer and a linear potential from the Bernoulli layer) along with a quadratic term which connects the layers mediated by the weight matrix W.

From this one specification we can in principle write down the model probability distribution function p(v; m,s,b,W). The most popular training scheme is to maximize the log-likelihood of the data given the model. That is, we want to compute,

The goal of this optimization problem is to find the parameters θ which maximize the likelihood of having observed the training data from the model distribution. There is more details on training further down in this blog post.

Restricted Boltzmann Machines

Restricted Boltzmann Machines (RBMs) are an important special class of energy-based neural networks. By definition an RBM has at least one hidden layer and carries the systematic restriction that the weight matrices only connect units in adjacent layers. This restriction allows a means of sampling the model distribution (block Gibbs sampling) which is vastly more efficient than the alternatives in the unrestricted case. Paysage was primarily designed to facilitate training of RBMs.

What follows is some more in-depth commentary about a couple of the features of Paysage and an annotated example of training a 2-layer Gaussian-Bernoulli Boltzmann Machine on MNIST.

Generalities on model construction in Paysage

Paysage allows you to construct any model whose energy function takes the form:

in which H_i() is the layer i’s energy function, r_i() it’s rescale function, W_i is the weight matrix connecting layers i and i-1. (Here we have conveniently seth_0 = v to compactify the formula)

In Paysage training and sampling such models relies on block Gibbs sampling; so the layer energy functions must lead to conditional probabilities p(v|rest), p(h_i|rest) that are relatively efficient to compute.

Paysage provides three built-in layer types in the layers module, GaussianLayer, BernoulliLayer, and OneHotLayer. Building a model is as simple as stacking such layers together:



How training works:

Maximizing the log likelihood over the parameters can be attempted via stochastic gradient descent. The expectation of the gradient of the log likelihood is,

which involves evaluating expectations over the model distribution. The primary difficulty in training these kinds of EBMs is the ‘intractability’ of the probability distributions p(v,h; θ). In particular, the denominator, Z, in the definition is often impossible to compute analytically. Fortunately there are a number of numerical schemes for apporoximating expectations over distributions of this exponential type.

Paysage implements Markov-chain Monte Carlo via block-Gibbs sampling to evaluate the gradient above (see [1] for the classic presentation of ‘Boltzmann learning’). Alternately, Paysage implements a sampling-free training scheme for RBMs arising from extended mean-field approximations to the free energy of the model (adapted from [2]).

[1] Ackley, David H; Hinton Geoffrey E; Sejnowski, Terrence J (1985), “A learning algorithm for Boltzmann machines”, Cognitive science, Elsevier, 9 (1): 147–169

Other Articles That Might Interest You