Let us consider some data (a vector of dimension ) and try to explain them with a model , which may be parameterized in some way (for the purpose of this post we don’t need to write it explicitly).
We allow for an additive and a multiplicative scaling of the model, described by two scalar parameters and , so that the full model with fit with is .
We make the strong assumption that our data uncertainties are Gaussian and described with a covariance matrix . As a result, the likelihood function is
We will also adopt Gaussian priors on the additive and multiplicative scaling parameters:
Whether those are broad or narrow is irrelevant; all that matters is that and are real numbers wich may be constrained by prior or external information.
In any Bayesian analysis of our data (hierarchical or not), we will have to compute terms like
which will show up as soon as we try to infer the parameters in while correctly dealing with and .
The typical approach to tackle this problem is to perform parameter inference (e.g., via MCMC sampling) for all the parameters, including and . However, given the simplicity of those two, we might wonder if we could get rid of them analytically. This is especially relevant if we don’t particularly care about them and they are nuisance parameters allowing us to fit better (i.e., we are only truly interested in the parameters of .
We will make use of the following identity and analytic marginalization:
This allows us to perform a first simplification in :
We see that this distribution is Gaussian in , and the maximum a posteriori value is . I have introduced and to shorten the equations below.
The second simplification, over this time, is slightly less trivial, but leads us to something like
with the terms
Again, we see that this distribution is Gaussian in , and the maximum a posteriori value is .
What does it tell us? We have re-written our target distribution as
I didn’t write the covariances to save a little bit of space.
This is great, because we have two elegant solutions to deal with our nuisance parameters and . First, we can set them to their maximum a posteriori solutions and , and compute with those values. This is equivalent to directly fitting for and at fixed , which is useful. (In this case, one needs to compute the covariance terms which I have omitted above).
Second, we can marginalize over and , since we have isolated there contributions and those are Gaussians! In other words, we can write
This is very useful; as I said previously, those terms unavoidably appear in any Bayesian analysis, hierarchical or not, We can now focus on and analytically marginalize over and when fitting , for example in each step of an MCMC algorithm constraining the parameters of . Sweet!