botorch icon indicating copy to clipboard operation
botorch copied to clipboard

Refactor botorch/sampling/pathwise and add support for product kernels

Open seashoo opened this issue 7 months ago • 3 comments

Motivation

Hi! I'm Sahran Ashoor, an undergraduate research assistant working for the Uncertainty Quantification Lab at the University of Houston. I work under Dr. Ruda Zhang and Taiwo Adebiyi, both of whom having already spoken with Max Balandat regarding incorporating a rebase of botorch/sampling/pathwise (Largely written by James. T. Wilson). The changes included in this pull request are my best attempt at faithfully completing the change logs I was provided (product_kernel_diff.txt).

Have you read the Contributing Guidelines on pull requests?

Yes!

Project Overview

The primary goal was to make the original codebase by Wilson compatible with the latest BoTorch version. To achieve this, we used the original source codes and test suites, which initially revealed several incompatibility issues. Our main contribution involved carefully rebasing Wilson's code while preserving the logic for pathwise sampling. Importantly, all changes were confined to the botorch/sampling/pathwise directory to ensure a seamless integration, passing both local pathwise test suites and BoTorch's global test suites via GitHub workflows.

In terms of code logic, we relied on Wilson's unit tests for prior, updates, and posterior sampling, which we believe are sufficient to validate the correctness of the implementation. However, we welcome your feedback on this approach, and would appreciate any suggestions for additional tests or example scripts to further confirm the robustness of the changes. We are open to collaborating further on this effort.

Test Plan

(Write your test plan here. If you changed any code, please provide us with clear instructions on how you verified your changes work. Bonus points for screenshots and videos!)

The entirety of the testing suite was ran through pytest. Through additional verification we've found that the logic may be offset, but we're hoping to work with you all and further validate these changes under the discretion of Dr. Zhang. Expect further communications directly from my lab that will provide more insight into the rebase.

Related PRs

(If this PR adds or changes functionality, please take some time to update the docs at https://github.com/pytorch/botorch, and link to your PR here.)

N/A

seashoo avatar May 05 '25 11:05 seashoo

Codecov Report

:x: Patch coverage is 99.62217% with 3 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 99.93%. Comparing base (0290e3d) to head (ebe03af). :warning: Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
botorch/utils/types.py 66.66% 3 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##              main    #2838      +/-   ##
===========================================
- Coverage   100.00%   99.93%   -0.07%     
===========================================
  Files          216      219       +3     
  Lines        20211    20722     +511     
===========================================
+ Hits         20211    20709     +498     
- Misses           0       13      +13     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar May 05 '25 19:05 codecov[bot]

Thanks @seashoo for the PR - this is a big one! It'll take me a bit of time to review this in detail, I plan to do a first higher-level pass this week.

Through additional verification we've found that the logic may be offset

What exactly does this mean?

Balandat avatar May 06 '25 05:05 Balandat

Thanks @seashoo for the PR - this is a big one! It'll take me a bit of time to review this in detail, I plan to do a first higher-level pass this week.

Through additional verification we've found that the logic may be offset

What exactly does this mean?

Hi @Balandat,

Thanks for the response! We've included a more detailed Project Overview section in the pull request description to clarify our validation approach. Specifically, we utilized the existing unit test files, which cover prior, updates, and posterior sampling, and ensured that all tests passed as part of this rebase. While these tests are comprehensive, we welcome any additional guidance you might have on further validating the code's robustness.

TaiwoAdebiyi23 avatar May 08 '25 18:05 TaiwoAdebiyi23

@balandat, thank you for the comprehensive review and detailed feedback on Wilson's product_kernel.diff implementation! My apologies for the time it's taken to get back to you- some of the implementations took quite some time to fully realize and I've been balancing the work alongside my current internship.

I'm excited to collaborate with you at a faster pace now that I've freed up time! If you have questions regarding any of my specific implementations, feel free to ask in a reply to any of the comments here- I'll be able to communicate much more swiftly now. I've went ahead and resolved some of the upstream merge conflicts that appeared while I was away, and I've also filled up the code coverage gaps as you've asked.

Here's a quick summary of the major changes implemented to address your concerns:

Mathematical Issues Resolved

Product Kernel Implementation

Completely redesigned the product_feature_generator which was failing with relative errors around 0.75 vs tolerance ~0.094

  • Separated finite-dimensional from infinite-dimensional sub-kernels
  • Used Hadamard products for infinite-dimensional kernel combinations
  • Applied outer products to merge finite-dimensional with combined infinite-dimensional kernels
  • Automatically enabled cosine_only=True for multiple infinite-dimensional kernels to avoid problematic tensor products

Transform System Overhaul

Fixed scaling and coordination issues across the transform pipeline

  • SineCosineTransform: Added conditional logic instead of rigid constant scaling
  • Parameter Passing: Improved explicit num_ambient_inputs handling vs complex kwargs manipulation
  • Transform Chaining: Enhanced append_transform utility to properly handle None cases

Architectural Improvements

Feature Map Redesign

Built comprehensive feature map architecture

  • DirectSumFeatureMap: Rewrote raw_output_shape method to handle mixed dimensionality without MagicMock detection
  • SparseDirectSumFeatureMap: Implemented for completeness, available for manual use
  • HadamardProductFeatureMap/OuterProductFeatureMap: Enhanced for proper kernel composition

Code Organization

Restructured from monolithic utils.py into modular package

  • Split into dedicated helpers, mixins, and transforms modules
  • Improved dispatcher system with specific return types (MultitaskKernelFeatureMap, DirectSumFeatureMap, etc.)
  • Fixed sub_kernels parameter usage for LCM kernel compatibility

Code Quality Enhancements

Modern Python Standards

  • Adopted PEP 604 union syntax (A | B vs Union[A, B]) throughout pathwise directory
  • Replaced large comment blocks with Google-style docstrings
  • Cleaned up redundant imports and unused attributes

Type Safety

Enhanced type annotations and return type specificity for better IDE support

Technical details regarding the issues + approaches taken in response to your suggestions are further addressed in my replies!

seashoo avatar Jul 29 '25 12:07 seashoo

Thanks for all the updates, @seashoo. Sorry for the delay on the review; I've been traveling for a conference and taking some time off. I should be able to provide a detailed review next week.

Balandat avatar Aug 08 '25 08:08 Balandat

@hvarfner has imported this pull request. If you are a Meta employee, you can view this in D80169068.

facebook-github-bot avatar Aug 13 '25 12:08 facebook-github-bot

Hello! I just wanted to follow up on the progress of this PR. I have added a few responses to @hvarfner's comments regarding clarifications + adjusting our logic in respect to the MTGP's use of ProductKernal as a definition.

There's a small merge conflict listed on the PR, but merging with BoTorch's upstream is causing dependency issues with GPyTorch imports on my local. I will investigate this issue further and commit the working upstream in the coming days.

seashoo avatar Oct 15 '25 23:10 seashoo

Thanks!

merging with BoTorch's upstream is causing dependency issues with GPyTorch imports on my local. I will investigate this issue further and commit the working upstream in the coming days.

@seashoo when you say dependency issues, do you mean in terms of installing the package or in terms of causing (e.g. import) failures? There has been a new gpytorch release just recently that you may need to update to: https://github.com/cornellius-gp/gpytorch/releases/tag/v1.14.2

Balandat avatar Oct 15 '25 23:10 Balandat

This looks great, extraordinary work. Seems like the branch is not entirely in sync, given the changes in optim. Thanks a lot.

I think the get_train_inputs and get_train_targets methods and dispatching logic is helpful in general, and should probably be located models/utils/helpers.py. If it's not too much work, it would be nice to have it there instead.

Thank you! I've moved the requested logic to models/utils/helpers.py, and I've also committed a merge with upstream which should address the sync with optim.

seashoo avatar Oct 27 '25 07:10 seashoo

Thanks!

merging with BoTorch's upstream is causing dependency issues with GPyTorch imports on my local. I will investigate this issue further and commit the working upstream in the coming days.

@seashoo when you say dependency issues, do you mean in terms of installing the package or in terms of causing (e.g. import) failures? There has been a new gpytorch release just recently that you may need to update to: https://github.com/cornellius-gp/gpytorch/releases/tag/v1.14.2

My local + virtual environment weren't taking the new update at the time. I ran a fresh install of gpytorch + all the necessary development tools in a new python venv- should be good now!

seashoo avatar Oct 27 '25 07:10 seashoo

@seashoo Thanks once again for taking the time with this refactor - great work!

Let's make sure the tests pass, and that we don't have any changes in unrelated files (e.g. knowledge_gradient.py, optim.py). Then, we can go ahead and merge!

hvarfner avatar Oct 27 '25 14:10 hvarfner

@hvarfner Thank you for your guidance! My apologies for the delayed response- I've been caught up in midterm exams for these past few weeks.

I made sure upstream was up-to-date, called pytest -ra on both this rebase commit + the official repo, and it seems that all test cases regarding pathwise are passing! There are 33 fails that appear on both ends, so presumably those should be out-of-scope relative to the changes on this PR. I believe that all out-of-scope files should be matching our upstream as well!

Are there any additional considerations I should be making? I've followed through with the ufmt format ., flake8 ., and build checks as per the contribution guide. I recall @Balandat initially requesting thorough code coverage + a quick glance over our new unit tests. Just let me know and I'll act on it as soon as I'm available!

seashoo avatar Nov 06 '25 18:11 seashoo

@Balandat, @hvarfner Hello! Just checking in since my last comment- are there any final validations I can cover until we work on a conclusive pass? I'll be free throughout the next few weeks to ensure merging goes smoothly.

Thanks for your time!

seashoo avatar Nov 18 '25 01:11 seashoo

Hi @seashoo ,

Thanks again for the massive job with this! We have no further comments on the code itself - the quality is great, and when/if all of the comments are addressed, we're good to merge.

I've started all the tests again. I'll help check what is in and out of scope of the PR, but if you think something is out of scope, you're most likely right and I will be happy to help out with that!

hvarfner avatar Nov 18 '25 20:11 hvarfner

Hi @hvarfner @Balandat,

I apologize for the confusion with this PR. Here's what happened:

Issue: This PR is currently showing "0 commits" but all my work is still intact on my fork.

What occurred: I was trying to address the CI test failures, particularly with a code mismatch in test_gp_regression_mixed.py, and sync with the latest upstream changes. During this process, I reset my main branch in respect to upstream, without carrying over the changes from this PR. This temporarily broke the PR's commit display on GitHub. I've since restored my branch to the original state (commit d85a6ccf7 from Oct 26), but GitHub is still showing the PR as empty.

Current status:

  • All my code is on my fork: https://github.com/seashoo/botorch_rebase_fork/commits/main
  • The PR has ~17 commits of work
  • Full diff available: https://github.com/meta-pytorch/botorch/compare/main...seashoo:botorch_rebase_fork:main

The work includes:

  • Refactoring botorch/sampling/pathwise for product kernel support
  • Comprehensive test coverage improvements for pathwise sampling
  • ProductKernel MTGP adjustments (October commits)

Request: Could you help me understand the best way to proceed? Options I'm considering:

  1. Can you reopen this PR to see if GitHub refreshes the commit display? To my knowledge, I can't view/adjust the PR cache locally.
  2. Should I close this and open a new PR with the current state?
  3. Is there another approach you'd recommend?

I'm expecting that once the PR is reopened and the cache cycles, the history will be restored given the current state from my fork. If we run into further issues, please let me know if there's a way I can help resolve this. If an entirely new PR is necessary, what's good is that the previous CI/CD pipeline failures should be eliminated once we match d85a6ccf7 to upstream. We'll also be able to perform a clean final merge at the appropriate scope. I know history and documentation during this process is vital- so I'm hoping to restore what was here if possible!

seashoo avatar Dec 11 '25 22:12 seashoo