autoemulate icon indicating copy to clipboard operation
autoemulate copied to clipboard

Improve handling of Gaussians distributions

Open cisprague opened this issue 6 months ago • 7 comments

This pull request is intended to improve the handling of Gaussians in AutoEmulate and standardize the outputs active learners can expects from emulators.

Implemented covariance structures:

  • [x] Full covariance
  • [x] Block-diagonal
  • [x] Diagonal
  • [x] Separable
  • [x] Dirac
  • [x] Empirical
  • [x] Ensemble

cisprague avatar May 15 '25 00:05 cisprague

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Need to discuss how to handle empirically constructed Gaussians. Currently, we have explicit specialised classes to handle them, e.g. Empirical_Block_Diagonal, but we could also append class methods from_samples to each class, e.g. Block_Diagonal.from_samples. Either way, there needs to be a way to convert the classes to Dense to construct ensembles with Ensemble --- the method to_dense accomplishes this.

Lastly, regarding compatibility with GPyTorch: they internally use LinearOperator to handle different kernel specialisations, which looks good but might be a bit more than we need.

cisprague avatar May 15 '25 18:05 cisprague

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 80.40%. Comparing base (365fc39) to head (14aafeb). Report is 187 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #471       +/-   ##
===========================================
- Coverage   90.53%   80.40%   -10.14%     
===========================================
  Files          96      104        +8     
  Lines        5983     7057     +1074     
===========================================
+ Hits         5417     5674      +257     
- Misses        566     1383      +817     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar May 17 '25 02:05 codecov-commenter

Coverage report

This PR does not seem to contain any modification to coverable code.

github-actions[bot] avatar May 17 '25 02:05 github-actions[bot]

Need to discuss how to handle empirically constructed Gaussians. Currently, we have explicit specialised classes to handle them, e.g. Empirical_Block_Diagonal, but we could also append class methods from_samples to each class, e.g. Block_Diagonal.from_samples. Either way, there needs to be a way to convert the classes to Dense to construct ensembles with Ensemble --- the method to_dense accomplishes this.

Lastly, regarding compatibility with GPyTorch: they internally use LinearOperator to handle different kernel specialisations, which looks good but might be a bit more than we need.

We now have .to_dense() and .from_dense() for most of the structured classes. These methods allow for the aggregation of structured distributions into an Ensemble class.

# Anisotropic empirical distribution from k samples at n sampling locations, each with d dimensions
k, n, d = 1000, 50, 3
samples = torch.rand(k, n, d)
dist0 = Empirical(samples)

# Isotropic empirical distribution (set off-diagonal elements to zero)
samples = torch.rand(k, n, d)
dist1 = Diagonal.from_dense(Empirical(samples))

# Just to demonstrate the .to_dense() method
dist1 = Diagonal.from_dense(dist1.to_dense())

# We can combine them into an ensemble
dist2 = Ensemble([dist0, dist1])
print(dist2.logdet(), dist2.trace(), dist2.max_eig())
tensor(-375.8140) tensor(12.4797) tensor(0.1201)

cisprague avatar May 19 '25 16:05 cisprague

This is looking great! I just want to confirm we all agree with where this code is meant to sit and how it is meant to be used. My understanding is that it will be within the active learning module and these classes are used to reshape the covariance matrix of the torch distribution returned by AutoEmulate (GaussianLike) to enable efficient metrics computation. Is this correct?

radka-j avatar May 20 '25 10:05 radka-j

It would also help me if we had some examples to illustrate when/how the different covariance structures emerge.

radka-j avatar May 20 '25 10:05 radka-j

I'd move the tutorial to the experimental directory and then merge as is.

radka-j avatar Oct 09 '25 09:10 radka-j