botorch icon indicating copy to clipboard operation
botorch copied to clipboard

[Feature Request] Derivative Enabled GPs

Open yyexela opened this issue 2 years ago • 15 comments

🚀 Feature Request

Hello, I'm interested in studying GPs with derivatives, similar to the example provided in this notebook provided in this issue. However, the previous notebook uses LogEI on noisy data, which isn't optimal. As such, I want to be able to use both qNEI and KG on noisy data using this derivative-enabled GP in BoTorch.

Motivation

Derivative enabled GPs allow for faster convergence rates. This is well studied in the literature and makes intuitive sense since a lot more information is provided with each objective function evaluation. I want code to be available for myself and other future researchers who want to explore derivative enabled GPs.

Pitch

Ideally, I would want to see a BoTorch model for GPs with derivative information that work without any issues when used with qNEI and KG. I want this implemented for my research and I have a significant amount of time I can contribute to implementing this myself. However, I'm not sure where to even begin since I'm not familiar with the code-base for BoTorch. I don't want to spend lots of time looking in the wrong place for implementing this, so I would appreciate any and all help! My current issue is trying to get qNEI to work. I've added self._num_outputs = 1+d to my derivative enabled GP class, but I'm getting the following error:

torch._C._LinAlgError: linalg.cholesky: (Batch element 0): The factorization could not be completed because the input is not positive-definite (the leading minor of order 6 is not positive-definite).

which I'm trying to resolve. As I've mentioned earlier, however, I'm not sure this is the right place to look. Thanks!

Are you willing to open a pull request? (See CONTRIBUTING) Yes

Additional context

I also briefly looked into sub-classing BatchedMultiOutputGPyTorchModel for my model, but the documentation says the outputs are independent, which I don't think applies to derivative enabled GPs. However, when I subclassed this I got another error:

RuntimeError: Cannot yet add fantasy observations to multitask GPs, but this is coming soon!

which led me to this issue from 4 years ago mentioning a similar problem to what I'm having.

Thanks again!

yyexela avatar Feb 15 '23 17:02 yyexela

Hi @yyexela, great to see you interested in using derivative-enabled GPs with other acquisition functions.

The error you're getting suggests that there may be numerical problems unrelated to the fact that this is using derivative-enabled GPs. On the other hand, we haven't really battle tested derivative-enabled GPs, so there may also be something else going on.

Could you provide a simple repro of the issue you're running into?

Balandat avatar Feb 15 '23 18:02 Balandat

Also cc @pabloprf re derivative-enabled GPs.

Balandat avatar Feb 15 '23 18:02 Balandat

Sure! I have two examples ready, one with derivative information (Simple_Repro_D.py.txt) and one without (Simple_Repro_non_D.py.txt). The first fails at torch._C._LinAlgError: linalg.cholesky and the second doesn't. Let me know if there is anything else I can help with! EDIT: and what my next steps should be ;)

yyexela avatar Feb 15 '23 18:02 yyexela

I think I'm getting close...

I have three files which I'm using for testing. One file just has shared functions, the other two define the models. One model is without derivatives, the other is with derivatives, but both use GPyTorch without calling a BoTorch model directly (though they both sub-class GPyTorchModel):

With this, (and lots of print statements in the source code) I can see that the non-derivative version (I call it non-D) eventually calls sample_cached_cholesky in botorch/utils. I can examine the covariance matrix and I see that two of the rows are identical. This is because I'm trying to compute the posterior at X_base (the points already sampled) concatenated with X_test (just zero). Since X_base contains zero, two rows of the covariance matrix are the same (both the first and last row). This is why the following message appears:

/home/alexey/.conda/envs/thesis/lib/python3.10/site-packages/linear_operator/utils/cholesky.py:40: NumericalWarning: A not p.d., added jitter of 1.0e-08 to the diagonal

Tracking the path of the code for the derivative-enabled case (I call it yes-D), I see that the same function is not called. This is because the function gets called in class MultivariateNormal's __init__ in gpytorch/distributions when either the mean or covariance used to construct the class is a LinearOperator. For the non-D case, the covariance matrix is:

<linear_operator.operators.constant_mul_linear_operator.ConstantMulLinearOperator object at 0x7f33fe4f36a0>

while for the yes-D case, this is not true. This results in a call to torch.linalg.cholesky(covariance_matrix) which throws the error:

torch._C._LinAlgError: linalg.cholesky: (Batch element 0): The factorization could not be completed because the input is not positive-definite (the leading minor of order 6 is not positive-definite).

which makes sense, since in my test example from the code above, there are six rows, the last one being identical to the first one!

So, I have the following questions:

  • Should I try to figure out how to make the yes-D case have a LinearOperator covariance matrix? It seems that for both the yes-D and non-D case the covariance matrices start as <linear_operator.operators.dense_linear_operator.DenseLinearOperator object at 0x7fc3403fe5f0>. For the non-D case it turns into the ConstantMulLinearOperator above while for the yes-D case it loses its LinearOperator class. or
  • Should I try to add some code to add functionality similar to the jitter in .../site-packages/linear_operator/utils/cholesky.py?

Thanks again and let me know if something wasn't clear!

yyexela avatar Feb 24 '23 02:02 yyexela

Sweet!

So I have a PR opened which resolves the issues I was having with the Cholesky decomposition. The simple fix I added makes it so that the Cholesky decomposition is calculated from the linear_operator package instead of pytorch. This allows for adding a jitter to the diagonal of a non positive definite matrix.

The matrix was not positive-definite, because as I mentioned above, two rows of the covariance matrix were identical if the acquisition function was being evaluated at a point that was evaluated on the objective function earlier.

I'll go ahead and start working on getting KG working too, let me know what else I can do! For example, if there are any issues in the PR:)

yyexela avatar Mar 01 '23 23:03 yyexela

Nice work tracking down this issue! It seems a bit weird to me to fix this by wrapping the covariance in a DenseLinearOperator though, especially since it represents the same object - and ideally the handling of numerical issues / singularity should be happening at a higher level than at the level of the covariance object.

I think we can consider merging this if it doesn't cause any issues also on our end, since the overhead of this should be very small. But we should also take a closer look at this - ideally we'd be able to just use torch Tensors and linear_operator LinearOperators interchangeably in all places without affecting the outcomes / robustness of the code, especially when the underlying representation is the same.

@SebastianAment - curious if you have any thoughts on this.

Balandat avatar Mar 05 '23 10:03 Balandat

@yyexela with the fix landed, is there anything outstanding on this feature request or can this be closed out?

Balandat avatar Mar 19 '23 17:03 Balandat

Great! Yes, I want to also get KG working with derivative GPs and after that I can make a tutorial notebook and then I think this could be closed out. The fix just gets qNEI working.

yyexela avatar Mar 19 '23 17:03 yyexela

I'll get to work on a tutorial notebook now that #2317 merged:)

yyexela avatar May 19 '23 15:05 yyexela

Amazing!

Balandat avatar May 19 '23 16:05 Balandat

@yyexela checking in if you had a chance to work on this tutorial?

Balandat avatar Dec 02 '23 21:12 Balandat

Hey! Sorry, yes, I had something working and life got in the way, I'll try to get something pushed in the next week or so. Thanks for checking in!

yyexela avatar Dec 02 '23 22:12 yyexela

@Balandat Found my tutorial and it still works so I was able to get it in sooner than planned:)

yyexela avatar Dec 03 '23 20:12 yyexela

Where can I see the versions of the packages installed for the smoke test? I was able to run my notebook just fine locally with a fresh conda environment and latest package versions, but got an error in my PR...

yyexela avatar Dec 03 '23 21:12 yyexela

Let's move this discussion onto the PR, will respond there.

Balandat avatar Dec 03 '23 21:12 Balandat