botorch outputscale and noise prior in SingleTaskGP

I have been using SingleTaskGP with the default priors, but today out of curiosity I decided to take a look at the priors. The input lengthscale prior seems ok assuming input has been normalized to [0,1], but I'm bit concerned if the noise variance and outputscale priors are too large, especially when the user has normalized y to zero mean and unit variance.

from gpytorch.priors.torch_priors import GammaPrior
# priors in SingleTaskGP
priors = {"lengthscale": GammaPrior(3.0, 6.0), 
          "outputscale": GammaPrior(2.0, 0.15), 
          "noise_var": GammaPrior(1.1, 0.05)}
x = torch.linspace(0., 20., 1000)
fix, ax = plt.subplots(1, 3, figsize=(18, 6))
for i, (name, prior) in enumerate(priors.items()):
    d = torch.exp(prior.log_prob(x))
    ax[i].plot(x.numpy(), d.numpy())
    ax[i].set_title(f"{name}: mean={prior.mean:.2f}, "
                    f"var={prior.variance:.2f}, "
                    f"mode={(prior.concentration-1)/prior.rate:.2f}")

or did I midunderstand something?

Jan 31 '20 21:01 shalijiang

So we work with a lot of pretty noisy data, and so our default prior for the noise reflects that (that said it may be a little overzealous given that we usually assume normalized data). If you are in a setting where observation noise may be very low / possibly zero, you'll want to use a different prior (people are using horseshoe priors in this case). The outputscale prior does indeed look a little over the top though. Let me take a closer look.

In general, if you have information about the setting you're in, you should always include it in the prior. You can either make a custom model class, or just update the priors after instantiating:

new_noise_prior = GammaPrior(concentration=0.5, rate=0.1)

noise_covar = model.likelihood.noise_covar
noise_covar.register_prior(
    "noise_prior",
    new_noise_prior,
    lambda: noise_covar.noise,
    lambda v: noise_covar._set_noise(v),
)

and similar for the outputscale.

Feb 03 '20 16:02 Balandat

I just want to comment that in my field we often have n_observations / n_parameters ~ 1 (ARD=True). In that setting the default priors for the lengthscale are too tight and often lead a SingleTaskGP to overfit. Thus I would argue for different default priors.

For completeness, here's an example for also setting the lengthscale and kernel variance priors.

model.likelihood.noise_covar.register_prior(
    "noise_prior",
    GammaPrior(concentration=2.0, rate=4.0),
    lambda: model.likelihood.noise_covar.noise,
    lambda v: model.likelihood.noise_covar._set_noise(v),
)
model.covar_module.register_prior(
    "outputscale_prior",
    GammaPrior(concentration=2.0, rate=4.0),
    lambda: model.covar_module.outputscale,
    lambda v: model.covar_module._set_outputscale(v),
)
model.covar_module.base_kernel.register_prior(
    "lengthscale_prior",
    GammaPrior(concentration=2.0, rate=0.2),
    lambda: model.covar_module.base_kernel.lengthscale,
    lambda v: model.covar_module.base_kernel._set_lengthscale(v),
)

Feb 03 '21 12:02 DavidWalz

It's certainly true that the default priors may not work particularly well in some situations. Generally it won't be possible to choose default priors that work uniformly good across all applications domains, but I'd be open to move to a set of priors that works better across the board.

It may also make sense to start some kind of repository of priors that people have found to work well in settings with different characteristics, since this is pretty arcane knowledge and it would be nice to gather that centrally in some way.

cc @dme65

Feb 03 '21 15:02 Balandat

botorch botorch copied to clipboard

outputscale and noise prior in SingleTaskGP

botorch
botorch copied to clipboard