botorch Implementing the most-likely heteroskedastic GP described in #180

Hi @Balandat, here is a very initial draft of the PR related to #180 .

A couple of things:

Normalization and standardization are really important for this function, but I am having trouble checking these requirements without the added batch dimension (check_standardization requires it). I'll probably need your help on this because it is unclear to me how I should enforce the batch shape since it is not a class extending a model with _set_dimensions
I have not yet implemented test functions, and may again need your expertise on this one. It's also a bit confusing since it is written as a function and not a test case for a model. I should also mention that none of the test cases I used for benchmarking in #180 had multidimensional outputs, so that may need to be checked.

Sorry this is such an incomplete PR, but hopefully it gives a draft function to work with, which you all can then extend. Happy to help any way I can. Also, I'll start trying to draft up the text for a notebook tutorial as suggested in #180

Aug 26 '19 15:08 mc-robinson

Thanks @mc-robinson, this is a great start. Will take a closer look in the next couple of days and help out with the shape challenges.

Re the test: This should be pretty straightforward since the model is a just a standard HeteroskedasticGP - so we can just write test cases for the model fitting and then do a very basic sanity check on the model. From a purely mechanical pov this is the same as testing any other function, see e.g. this test case.

Aug 26 '19 16:08 Balandat

@Balandat I'm digging up a really old PR here but it seemed like the right place to ask after doing a bit of searching for the right topic. It appears that this heteroskedastic noise GP is actually implemented and as such this PR can be closed (botorch.models.gp_regression. HeteroskedasticSingleTaskGP ). Is this correct? Just trying to make sure I'm using the right model. Thanks!

Mar 08 '23 14:03 matthewcarbone

@Balandat I'm digging up a really old PR here but it seemed like the right place to ask after doing a bit of searching for the right topic. It appears that this heteroskedastic noise GP is actually implemented and as such this PR can be closed (botorch.models.gp_regression. HeteroskedasticSingleTaskGP ). Is this correct? Just trying to make sure I'm using the right model. Thanks!

Not @Balandat but have been looking at this today also so will chime in. Short answer no. The HeteroskedasticSingleTaskGP in Botorch requires a y variance for each input whereas this PR attempts to implement the most likely hetroskedastic GP model that takes just the inputs x and y and fits the noise GP using an EM loop. If you have uncertainty or precision measurements the class that is implemented is what you want if not then this PR implements a model that doesn't require the variances as inputs although I believe there is some bug/API issue preventing it being merged.

Mar 08 '23 15:03 CompRhys

@CompRhys ah of course. This is what happens when I comment on PRs after just waking up. Thanks for clarifying!

Mar 08 '23 15:03 matthewcarbone

I believe there is some bug/API issue preventing it being merged

I don't think there is an API issue; basically there were some issues with the fitting stability (and there may be bugs). We'd basically need to get back to this and spend some time on it but it has been hard to find that time given our other priorities. If anyone needs this model it would be great if they could take a stab at it; we're very happy to provide feedback and input on it!

Mar 08 '23 15:03 Balandat

https://github.com/pytorch/botorch/issues/861 - This is the potential API issue I think is related as when trying to piece this together it links to issues where the noise is still logged internally and this is having some impact as the inverse transform not happening

Mar 08 '23 15:03 CompRhys

If anyone needs this model it would be great if they could take a stab at it; we're very happy to provide feedback and input on it!

Would you want it like this PR as a utility function as opposed to a Class with overridden fit method?

Mar 08 '23 15:03 CompRhys

I'm trying to get approval to contribute upstream to BoTorch (re: CLA) so if I end up getting the green light this would definitely be something I'd be interested in trying to help with. Will see what happens 👍

Mar 08 '23 21:03 matthewcarbone

Hey @CompRhys! Great to see you here! I was reading through a recent article from Milad Abolhasani and Keith Brown (DOI: 10.1557/s43577-023-00482-y), which led me back to heteroskedastic noise. Here's a brief snippet:

While [Bayesian optimization] has gained a great deal of traction in guiding experiments, Noack and Reyes discuss the limitations of these off-the-shelf tools and how they should be modified to improve progress. Specifically, key simplifying assumptions of most GPR approaches are that points that are equidistant in parameter space are equally correlated and each location in parameter space is expected to feature the same amount of noise.

The article from Noack and Reyes was an interesting read. My main takeaways were that non-stationary kernels are especially useful for materials science problems but are non-trivial to find optimal solutions due to the large number of hyperparameters, and (relevant to this PR) shifting towards a heteroskedastic treatment of noise is useful.

Mar 09 '23 03:03 sgbaird

@sgbaird If you'd like, I can ask Kris [Reyes] about this next time I meet with him, we chat ~weekly 😁

I'm slowly being convinced that kernel flexibility is important. That said, I don't know how broadly useful very flexible or niche kernels are for the larger community. Definitely going to read that paper though, thanks for sharing.

Mar 09 '23 04:03 matthewcarbone

Would you want it like this PR as a utility function as opposed to a Class with overridden fit method?

Hmm so our classes don't actually have a fit method themselves, but we have moved to multiple-dispatch based fitting. So you could register a custom fitting method for this model as is done e.g. here: https://github.com/pytorch/botorch/blob/main/botorch/fit.py#LL286-L286C2

So there are two options here:

Implement a MostLikelyHeteroskeadsticGP model class and hook it up to the custom fitting strategy using multiple dispatch. This is a bit of a weird setup though - currently models that are fit are modified in-place during fitting, and they don't change type. The most likely heteroskedastic GP is really just a way of generating a HeteroskedasticSingleTaskGP object, it's just generated somewhat differently. I guess the way you could do this is subclass MostLikelyHeteroskeadsticGP from HeteroskedasticSingleTaskGP, change the constructor to not take in train_Yvar, and then in the custom fitting method just generate a sequence of intermediate models to estimate the variances, and load the (estimated) train_Yvar into the final model.
Just use a helper function as in this PR. This should be quite a bit simpler and not cause any issues / challenges with the class structure. I suggest going with this approach as a first pass (can always rethink that when we have a working model).

Mar 12 '23 20:03 Balandat

I'm slowly being convinced that kernel flexibility is important. That said, I don't know how broadly useful very flexible or niche kernels are for the larger community. Definitely going to read that paper though, thanks for sharing.

Yeah It think this is a good point. If you're a domain expert and understand the problem well, customizing kernels, priors, etc., can be very helpful to improve performance on the problem at hand. But this increases complexity and the number of failure modes, so it's a tradeoff how much we want to have more specialized / flexible kernels supported in general purpose software.

Mar 12 '23 20:03 Balandat

botorch botorch copied to clipboard

Implementing the most-likely heteroskedastic GP described in #180

botorch
botorch copied to clipboard