Marginal likelihood

A marginal likelihood is a likelihood function that has been integrated over the parameter space. In Bayesian statistics, it represents the probability of generating the observed sample from a prior and is therefore often referred to as model evidence or simply evidence.

Concept

Given a set of independent identically distributed data points $\mathbf {X} =(x_{1},\ldots ,x_{n}),$ where $x_{i}\sim p(x|\theta )$ according to some probability distribution parameterized by $\theta$ , where $\theta$ itself is a random variable described by a distribution, i.e. $\theta \sim p(\theta \mid \alpha ),$ the marginal likelihood in general asks what the probability $p(\mathbf {X} \mid \alpha )$ is, where $\theta$ has been marginalized out (integrated out):

p(\mathbf {X} \mid \alpha )=\int _{\theta }p(\mathbf {X} \mid \theta )\,p(\theta \mid \alpha )\ \operatorname {d} \!\theta

The above definition is phrased in the context of Bayesian statistics in which case $p(\theta \mid \alpha )$ is called prior density and $p(\mathbf {X} \mid \alpha )$ is the posterior density. The marginal likelihood quantifies the agreement between data and prior in a geometric sense made precise in de Carvalho et al. (2019). In classical (frequentist) statistics, the concept of marginal likelihood occurs instead in the context of a joint parameter $\theta =(\psi ,\lambda )$ , where $\psi$ is the actual parameter of interest, and $\lambda$ is a non-interesting nuisance parameter. If there exists a probability distribution for $\lambda$ , it is often desirable to consider the likelihood function only in terms of $\psi$ , by marginalizing out $\lambda$ :

{\mathcal {L}}(\psi ;\mathbf {X} )=p(\mathbf {X} \mid \psi )=\int _{\lambda }p(\mathbf {X} \mid \lambda ,\psi )\,p(\lambda \mid \psi )\ \operatorname {d} \!\lambda

Unfortunately, marginal likelihoods are generally difficult to compute. Exact solutions are known for a small class of distributions, particularly when the marginalized-out parameter is the conjugate prior of the distribution of the data. In other cases, some kind of numerical integration method is needed, either a general method such as Gaussian integration or a Monte Carlo method, or a method specialized to statistical problems such as the Laplace approximation, Gibbs/Metropolis sampling, or the EM algorithm.

It is also possible to apply the above considerations to a single random variable (data point) $x$ , rather than a set of observations. In a Bayesian context, this is equivalent to the prior predictive distribution of a data point.

Applications

Bayesian model comparison

In Bayesian model comparison, the marginalized variables are parameters for a particular type of model, and the remaining variable is the identity of the model itself. In this case, the marginalized likelihood is the probability of the data given the model type, not assuming any particular model parameters. Writing $\theta$ for the model parameters, the marginal likelihood for the model M is

p(\mathbf {X} \mid M)=\int p(\mathbf {X} \mid \theta ,M)\,p(\theta \mid M)\,\operatorname {d} \!\theta

It is in this context that the term model evidence is normally used. This quantity is important because the posterior odds ratio for a model M₁ against another model M₂ involves a ratio of marginal likelihoods, the so-called Bayes factor:

{\frac {p(M_{1}\mid \mathbf {X} )}{p(M_{2}\mid \mathbf {X} )}}={\frac {p(M_{1})}{p(M_{2})}}\,{\frac {p(\mathbf {X} \mid M_{1})}{p(\mathbf {X} \mid M_{2})}}

which can be stated schematically as

posterior odds = prior odds × Bayes factor

References

Charles S. Bos. "A comparison of marginal likelihood computation methods". In W. Härdle and B. Ronz, editors, COMPSTAT 2002: Proceedings in Computational Statistics, pp. 111–117. 2002. (Available as a preprint on the web: [1])
de Carvalho, Miguel; Page, Garritt; Barney, Bradley (2019). "On the geometry of Bayesian inference". Bayesian Analysis. 14 (4): 1013‒1036. (Available as a preprint on the web: [2])
Lambert, Ben (2018). "The devil is in the denominator". A Student's Guide to Bayesian Statistics. Sage. pp. 109–120. ISBN 978-1-4739-1636-4.
The on-line textbook: Information Theory, Inference, and Learning Algorithms, by David J.C. MacKay.

Marginal likelihood

Contents

Concept

Applications

Bayesian model comparison

See also

References

Navigation menu