### Basics of Integration using Monte Carlo

Recently, someone asked me about Monte Carlo. So, I thought I should write this post to provide a basic introduction of performing integration using Monte Carlo. "Why integration?", you asked. Well, this is because integration is one of the main operations done in computing the posterior probability distributions used in machine learning and probabilistic filtering (e.g. Bayes filter). For example, consider the typical posterior probability expression in the Bayes filtering context, $$p(x_{t} \mid y_{1:t})$$, i.e. the probability of hidden state $$x_{t}$$ given the observed sequence of measurements up to the current time point $$y_{1:t}$$, $$p(x_{t} \mid y_{1:t}) = \frac{p(y_{t} \mid x_{t}) p(x_{t} \mid y_{1:t-1})}{p(y_{t} \mid y_{1:t-1})}$$

The $$p(x_{t} \mid y_{1:t-1})$$ is of particular interest. It can be seen as the prediction of $$x_{t}$$ from previous observations $$y_{1:t-1}$$ and can be expressed as $$p(x_{t} \mid y_{1:t-1}) = \int p(x_{t} \mid x_{t-1}) p(x_{t-1} \mid y_{1:t-1}) dx_{t-1}$$ The tractibility of computing this integral depends on the exact form of the model.

The challenge is that, in some cases, the integration is intractable and there is no closed form expression. One of the ways would then be to approximate the integral using Monte Carlo sampling. Of course, if samples of high-dimensional random variable are involved or sampling from the target distribution could not be done easily, Markov Chain Monte Carlo (MCMC) needs to be used instead. However, that is a topic for another day.

Performing integration using Monte Carlo relies on the simple fact that the expectation of a random variable is simply the integral of random variable realisations weighted by the probability of seeing the corresponding realisations, e.g. $$E[X] = \int xp(x)dx$$ and the fact that this expectation can be approximated from samples, $$E[X] \approx \frac{1}{N}\sum_{i=1}^{N}x_{i}.$$

Now, consider the integral, $$g(x) = \int_{0}^{x} \exp(-y^{2}) dy,$$ which has no closed form expression. Recall that the expectation of function $$f(Y)$$, with $$Y$$ being the random variable distributed according to $$p(Y)$$ is $$E[f(Y)] = \int f(y)p(y) dy.$$

If we define $$p(y) = \frac{1}{x}$$ and $$f(y) = \exp(-y^{2})$$, $$g(x)$$ becomes $$g(x) = x\int_{0}^{x} f(y)p(y)dy = xE[f(Y)]$$ This means, to approximate $$g(x)$$, we generate realisations of $$Y$$ from $$p(y)$$ which in this case is a uniform distribution $$\frac{1}{x}$$ with support $$0 \le Y \le x$$. Then, $$E[f(Y)]$$ is approximated using Monte Carlo sampling as above before multiplying with the value of $$x$$.

Here is a python script showing a comparison between integration using numerical means and using Monte Carlo sampling:
#!/usr/bin/env python3
# Ricky Wong

# exploring "exact" integration vs integration using monte carlo
# example would revolve around integrating:
# g(x) = \int_{0}^{x} y dy
# with x = 10

import numpy as np
from scipy.stats import uniform
if __name__ == '__main__':
x_value = 10
# exact integration. exact here may not be exact as this
# is numerically computed.
exact_result = quad(lambda x: x, 0, x_value)[0]
print('Exact integration: {0}'.format(exact_result))

# integration using monte carlo.
# g(x) can be expressed as
# g(x) = x \int_{0}_{x} y (1/x) dy
#
# We can compare this against the typical expression for expectation
# E[y] = \int y p(y) dy where the integral is taken over the domain
# of p(y).
#
# In this case, we have p(y) = 1/x for 0 <= y <= x and 0 otherwise.
for N_samples in [10,100,1000,10000,100000,1000000]:
samples = uniform.rvs(scale=x_value, size=N_samples)
mc_result = np.mean(samples)*x_value
print('Integration using Monte Carlo with {} samples: {}'.format(N_samples, mc_result))

Exact integration: 50.0
Integration using Monte Carlo with 10 samples: 45.864460853541544
Integration using Monte Carlo with 100 samples: 47.47774415153338
Integration using Monte Carlo with 1000 samples: 49.52460520229164
Integration using Monte Carlo with 10000 samples: 50.13432005099408
Integration using Monte Carlo with 100000 samples: 50.03289975433659
Integration using Monte Carlo with 1000000 samples: 49.947325860483815