botorch icon indicating copy to clipboard operation
botorch copied to clipboard

NP Regression Model w/ LIG Acquisition

Open eibarolle opened this issue 10 months ago • 32 comments

Motivation

This pull request adds a Neural Process Regression Model with a Latent Information Gain acquisition function for BoTorch functionality.

Have you read the Contributing Guidelines on pull requests?

Yes, and I've followed all the steps and testing.

Test Plan

I wrote my own unit tests for both the model and acquisition function, and all of them passed. The test files are in the appropriate folder. In addition, I ran the pytests on my files, and all of them succeeded for those files.

Related

I made a repository holding the pushed files at https://github.com/eibarolle/np_regression, and it has the appropriate API documentation.

eibarolle avatar Jan 18 '25 11:01 eibarolle

Hi @eibarolle! Thanks for the PR! I'll review it shortly.

sdaulton avatar Jan 22 '25 17:01 sdaulton

I applied the suggested changes to my new code. Note that for the decoder, the other functions are designed around its current state.

eibarolle avatar Jan 25 '25 14:01 eibarolle

@hvarfner curious if you have any thoughts on this PR re the information gain aspects

Balandat avatar Jan 27 '25 17:01 Balandat

Interesting! I'll check out the paper quickly and get back to you

hvarfner avatar Jan 27 '25 18:01 hvarfner

The Latent Information Gain forward function has been updated with the correct dimensions, with the test cases adjusted as needed.

eibarolle avatar Feb 02 '25 13:02 eibarolle

Notify me when you can check over the new Latent Information Gain function. @hvarfner

eibarolle avatar Feb 10 '25 09:02 eibarolle

Any updates @hvarfner?

eibarolle avatar Feb 17 '25 22:02 eibarolle

Hi @eibarolle,

Thanks for pinging, and sorry for being slow in this. I've taken a good look at it now.

Some issues pointed out by @sdaulton (MAE etc.) are still there or are unaddressed, so it would be nice to have these addressed. Moreover, the code would need to be formatted (see CONTRIBUTING.md).

On my previous point, the acquisition function should return a tensor of shape N, unless the model is an ensemble. Thus, the q-dimension should be reduced over. In this case, it should probably amount to a sum(dim=-1), but that is just my educated guess.

Moreover, It is not obvious how run the code deviates substantially from other BoTorch models (e.g. SingleTaskGP). Thus, I think this PR would have to come with an example (e.g. a tutorial) of the intented usage. Moreover, the code should adhere as much as possible to the standard BoTorch interfaces that are in place, ideally so that one can swap an STGP for an NPR (with mostly all the same arguments otherwise). As far as I can tell, this should be doable. The same goes for the training of the model, which you aptly pointed out in a previous commit here! Specifically, here are the obvious deviations from the convention that should be addressed:

  • The model should take train_X and (optionally) train_Y, but not much more than that
  • The acquisition function should take the model, but not the training data
  • The model should be trained with a BoTorch optimizer (probably fit_gpytorch_mll_torch in your case)

It seems like you have done it differently, where the model takes NN parameters (these should, if anything, be kwargs with sensible defaults, like here). Moreover, you have the acquisition function take the train_X and train_Y, if I am not mistaken. The idea is to have NPR work with other acquisition functions, and to have LIG work with other models as long as they have a latent space with a similar method to call for the latent space predictions.

I recognize that this is a lot of work, but adhering to this standard ultimately ensures the usefulness of your code.

Now, I am not quite sure what the bar is for community contributions, so it would be good to get e.g. @sdaulton 's or @Balandat 's take on this as well.

hvarfner avatar Feb 18 '25 09:02 hvarfner

Got it, thanks for the advice and pointers. I'll make the needed additions/examples, and I'll contact about any questions.

eibarolle avatar Feb 18 '25 22:02 eibarolle

@hvarfner I performed some adjustments to the NPR model and LIG acquisition function over the last two weeks for generalizability, and I'll soon update the testing, examples, and documentation. Of note, I've updated the NPR posterior, forward, and init functions such that it takes the expected parameters (including changing forward's output to a MultivariateNormal like other models and for compatibility with fit_gpytorch_mll_torch). I also looked over the contributions document and ran ufmt and flake8 over my code. For the training itself, I'm still slightly confused, but my main focus right now is finishing and uploading the tests and examples.

eibarolle avatar Mar 06 '25 10:03 eibarolle

@hvarfner The test cases have been uploaded, and the main points you requested have been added to my knowledge. For the training, optimize_acqf can be utilized for it. I can update the documentation with the proper methods and an example soon.

eibarolle avatar Mar 23 '25 11:03 eibarolle

For the intended usage, it's laid out in the NP Regression repo's README, including the original study. Its documentation has also been updated with the correct parameters.

eibarolle avatar Apr 01 '25 11:04 eibarolle

@sdaulton When you have the time, can you look over my changes?

eibarolle avatar Apr 03 '25 09:04 eibarolle

Hi @eibarolle,

Sorry for taking a long time to get back on this on my end.

It seems like the forward pass is still not in line with the rest of BoTorch acquisition functions, so my comments from Jan 28 are still an issue. As such, optimize_acqf still does not work with LatentInformationGain. I uploaded a new branch in your repo to provide some pointers with comments in LIG.

I still think that a tutorial would be really nice to show the intended use. This would serve the dual purpose of debugging on your end, because the contribution currently does not fit within the existing BoTorch ecosystem. As mentioned previously, this is due to the shapes in the forward pass.

I tried setting what I believe is the intended use in the form of a notebook, using optimize_acqf and fit_gpytorch_model_torch. Here is that notebook, which builds on your commits. Note the comment on the acquisition function forward pass, because this is key to get right! Ask away if you have any additional questions, and I will address them more rapidly.

hvarfner avatar Apr 11 '25 12:04 hvarfner

@eibarolle This unfortunately means that some test code, like the forward pass for the acquisition function, also needs fixing to account for the required changes.

hvarfner avatar Apr 11 '25 12:04 hvarfner

Thank you for the responses. I've already done some code on my end about sdaulton's comments, and I'm working on hvarfner's right now. I'll try uploading my changes soon.

eibarolle avatar Apr 14 '25 23:04 eibarolle

I finished my changes for debugging the notebook’s, model’s, and acquisition’s errors along with the test cases. I’ll write a tutorial and update the documentation when I have the time. For sdaulton’s issues, I’ll comment on his posts. For hvarfner's, I rewrote the model so that it doesn't have the ExactGP/GP error and the acquisition function to account for the dimensions correctly with optimize_acqf.

eibarolle avatar Apr 30 '25 12:04 eibarolle

@hvarfner @sdaulton I forgot to notify you both directly earlier, but my recent changes were posted.

eibarolle avatar May 04 '25 12:05 eibarolle

Looks like there is a merge conflict in docs/acquisition.md and docs/models.md too

sdaulton avatar May 05 '25 19:05 sdaulton

@sdaulton I managed to implement most of your suggestions on the model, acquisition function, and docs. However, I'm having trouble with the batch computation for the acquisition function from dimensional mismatches, so I'm holding off on it for now.

eibarolle avatar May 17 '25 01:05 eibarolle

@sdaulton Sending a reminder about the newest adjustment to the NP regression pull request.

eibarolle avatar May 21 '25 11:05 eibarolle

@sdaulton has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar May 21 '25 20:05 facebook-github-bot

Thanks! Running tests now

sdaulton avatar May 21 '25 20:05 sdaulton

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 100.00%. Comparing base (76c27bd) to head (6d0de3a). :warning: Report is 54 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##              main     #2683    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files          211       213     +2     
  Lines        19319     19767   +448     
==========================================
+ Hits         19319     19767   +448     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar May 21 '25 20:05 codecov[bot]

Looks like some more lines require test coverage, and it looks like there is some non-unix new-line character in the code files.

You must use ONLY Unix linebreaks \n in source code.
Lint code: TXT1
Lint name: dos-newlines

sdaulton avatar May 21 '25 21:05 sdaulton

I managed to get the test coverage to 100% from including the missing lines, as well as remove the reference comment. @sdaulton @esantorella

eibarolle avatar May 28 '25 12:05 eibarolle

@sdaulton The related smoke tests went mostly smoothly, but the issues encountered are with authentication and for files not created or adjusted by me.

eibarolle avatar May 31 '25 02:05 eibarolle

@sdaulton @esantorella Sending another reminder about my most recent commit.

eibarolle avatar Jun 05 '25 22:06 eibarolle

@sdaulton @esantorella Bumping another reminder about the pull request/merge.

eibarolle avatar Jun 18 '25 19:06 eibarolle

@eibarolle looks like there are some lines still missing test coverage. Could you add tests for those gaps please? Thanks!

sdaulton avatar Jun 20 '25 22:06 sdaulton