autoemulate
autoemulate copied to clipboard
Improve handling of Gaussians distributions
This pull request is intended to improve the handling of Gaussians in AutoEmulate and standardize the outputs active learners can expects from emulators.
Implemented covariance structures:
- [x] Full covariance
- [x] Block-diagonal
- [x] Diagonal
- [x] Separable
- [x] Dirac
- [x] Empirical
- [x] Ensemble
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Need to discuss how to handle empirically constructed Gaussians. Currently, we have explicit specialised classes to handle them, e.g. Empirical_Block_Diagonal, but we could also append class methods from_samples to each class, e.g. Block_Diagonal.from_samples. Either way, there needs to be a way to convert the classes to Dense to construct ensembles with Ensemble --- the method to_dense accomplishes this.
Lastly, regarding compatibility with GPyTorch: they internally use LinearOperator to handle different kernel specialisations, which looks good but might be a bit more than we need.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 80.40%. Comparing base (
365fc39) to head (14aafeb). Report is 187 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #471 +/- ##
===========================================
- Coverage 90.53% 80.40% -10.14%
===========================================
Files 96 104 +8
Lines 5983 7057 +1074
===========================================
+ Hits 5417 5674 +257
- Misses 566 1383 +817
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Coverage report
This PR does not seem to contain any modification to coverable code.
Need to discuss how to handle empirically constructed Gaussians. Currently, we have explicit specialised classes to handle them, e.g.
Empirical_Block_Diagonal, but we could also append class methodsfrom_samplesto each class, e.g.Block_Diagonal.from_samples. Either way, there needs to be a way to convert the classes toDenseto construct ensembles withEnsemble--- the methodto_denseaccomplishes this.Lastly, regarding compatibility with
GPyTorch: they internally use LinearOperator to handle different kernel specialisations, which looks good but might be a bit more than we need.
We now have .to_dense() and .from_dense() for most of the structured classes. These methods allow for the aggregation of structured distributions into an Ensemble class.
# Anisotropic empirical distribution from k samples at n sampling locations, each with d dimensions
k, n, d = 1000, 50, 3
samples = torch.rand(k, n, d)
dist0 = Empirical(samples)
# Isotropic empirical distribution (set off-diagonal elements to zero)
samples = torch.rand(k, n, d)
dist1 = Diagonal.from_dense(Empirical(samples))
# Just to demonstrate the .to_dense() method
dist1 = Diagonal.from_dense(dist1.to_dense())
# We can combine them into an ensemble
dist2 = Ensemble([dist0, dist1])
print(dist2.logdet(), dist2.trace(), dist2.max_eig())
tensor(-375.8140) tensor(12.4797) tensor(0.1201)
This is looking great! I just want to confirm we all agree with where this code is meant to sit and how it is meant to be used. My understanding is that it will be within the active learning module and these classes are used to reshape the covariance matrix of the torch distribution returned by AutoEmulate (GaussianLike) to enable efficient metrics computation. Is this correct?
It would also help me if we had some examples to illustrate when/how the different covariance structures emerge.
I'd move the tutorial to the experimental directory and then merge as is.