*Paysage** is a new **PyTorch**-powered python library for machine learning with Restricted Boltzmann Machines. We built Paysage from scratch at **Unlearn.AI** in order to bring the power of GPU acceleration, recent developments in machine learning, and our own new ideas to bear on the training of this model class.*

*We are excited to release this toolkit to the community as an open-source software library. This first release is version 0.2.0. We will be pushing regular updates from here on out, with many new features in the pipeline. **đ*

*The next release (0.3.0) is scheduled for July 10, 2018.*

â

**Paysage is a powerful library for training RBMsâââand more-generally, energy-based neural network models.**

Paysage allows one to construct models

- composed of several different kinds of layer types,
- of any depth.

Paysage implements a number of training methods such as,

- layerwise training for deep models,
- (persistent) contrastive divergence with MCMC,
- TAP-based, sampling-free extended mean field methods.

Paysage is built on top of two backends: numpy and PyTorch. The latter unlocks the power of the GPU for greatly optimized training and sampling.

We at Unlearn are excited to release this project and let the community take it to new heights.

**What is an energy-based model?**

Energy-based models (EBMs) are machine learning models which arise from the specification of an energy function** E(X; Î¸)**. This function yields a scalar value for each configuration of some specified variables

So an EBM also represents a probability distribution over **X** for each choice of ** Î¸**. From this perspective, training can then be thought of as fitting the probability distribution

**Visible and hidden variables**

It is often the case that only some subset of the configuration variables **X** are observed in some training set. In this case the observed variables are called *visible *and the rest *hidden* or *latent* variables of the model. So **X** splits into **X = (V, H)**. In this case the family of probability distributions defined by the model which are relevant to training are the marginals over the hidden variables,

**EBMs are natural tools in unsupervised learning**

Suppose one observes a bunch of samples of variables** V**. These samples can be thought of as random draws from some hypothetical *data distribution*** p_d(V)**. An EBM can be trained to approximate the hypothetical data distribution by fitting

- draw new representative samples from the model distribution, thus simulating the processes which generate the training data,
- answer counter-factual questions about the data variables,
- make discriminative predictions between training data variables by conditioning the model distribution,
- impute missing data from the model distribution.

Furthermore, the fact that an EBM describes an explicit, parametrized probability distribution is a benefit compared to some of the other kinds of generative models.

**Energy-based neural networks**

An energy-based neural network is an EBM in which the energy function **E(v,h; Î¸)** arises from the architecture of an artificial neural network. It is probably best to understand this by way of example. Consider a two-layer Gaussian-Bernoulli Restricted Boltzmann Machine (GBRBM). Such a model has, say,

The energy function for this model takes the form,

This energy function has a contribution from each layer (a quadratic potential from the Gaussian layer and a linear potential from the Bernoulli layer) along with a quadratic term which connects the layers mediated by the *weight matrix* **W**.

From this one specification we can in principle write down the model probability distribution function ** p(v; m,s,b,W)**. The most popular training scheme is to maximize the log-likelihood of the data given the model. That is, we want to compute,

The goal of this optimization problem is to find the parameters *Î¸** *which maximize the likelihood of having observed the training data from the model distribution. There is more details on training further down in this blog post.

**Restricted Boltzmann Machines**

Restricted Boltzmann Machines (RBMs) are an important special class of energy-based neural networks. By definition an RBM has at least one hidden layer and carries the systematic restriction that the weight matrices only connect units in adjacent layers. This restriction allows a means of sampling the model distribution (block Gibbs sampling) which is vastly more efficient than the alternatives in the unrestricted case. Paysage was primarily designed to facilitate training of RBMs.

What follows is some more in-depth commentary about a couple of the features of Paysage and an annotated example of training a 2-layer Gaussian-Bernoulli Boltzmann Machine on MNIST.

**Generalities on model construction in Paysage**

Paysage allows you to construct any model whose energy function takes the form:

in which **H_i()** is the layer **i**âs *energy* function, **r_i()** itâs *rescale* function, **W_i** is the weight matrix connecting layers **i** and **i-1**. (Here we have conveniently set**h_0 = v** to compactify the formula)

In Paysage training and sampling such models relies on block Gibbs sampling; so the layer energy functions must lead to conditional probabilities **p(v|rest)**, **p(h_i|rest)** that are relatively efficient to compute.

Paysage provides three built-in layer types in the layers module, GaussianLayer, BernoulliLayer, and OneHotLayer. Building a model is as simple as stacking such layers together:

**How training works:**

Maximizing the log likelihood over the parameters can be attempted via stochastic gradient descent. The expectation of the gradient of the log likelihood is,

which involves evaluating expectations over the model distribution. The primary difficulty in training these kinds of EBMs is the âintractabilityâ of the probability distributions **p(v,h; Î¸)**. In particular, the denominator,

Paysage implements Markov-chain Monte Carlo via block-Gibbs sampling to evaluate the gradient above (see [1] for the classic presentation of âBoltzmann learningâ). Alternately, Paysage implements a sampling-free training scheme for RBMs arising from extended mean-field approximations to the free energy of the model (adapted from [2]).

[1] Ackley, David H; Hinton Geoffrey E; Sejnowski, Terrence J (1985), âA learning algorithm for Boltzmann machinesâ, *Cognitive science*, Elsevier, **9** (1): 147â169

â