Surrogates.jl icon indicating copy to clipboard operation
Surrogates.jl copied to clipboard

More Kriging improvements (log likelihood, noise variance, more tests)

Open archermarx opened this issue 2 years ago • 5 comments

This addresses https://github.com/SciML/Surrogates.jl/issues/375 as well as adds some more testing utilities to move us toward more rigorous and comprehensive property-based testing of surrogates.

1. Noise variance

The Kriging object now has a new field, noise_variance, which defaults to zero. This is a number added to the main diagonal of the covariance matrix which allows Kriging to model noisy functions

Here's what including noise variance looks like:

Random.seed!(1234)
  lb = 0.0
  ub = 10.0
  func = x -> sin(x)
  n_samples = 50
  x = sample(n_samples, lb, ub, SobolSample())

  # Add some random noise to the function
  y = func.(x) .+ 0.1 * randn(n_samples)

  # Build a kriging surrogate without noise
  my_k = Kriging(x, y, lb, ub, noise_variance = 0.0)

  x_fine = LinRange(lb, ub, 10000)
  y_fine = func.(x_fine)
  pred_fine = my_k.(x_fine)
  errs = std_error_at_point.(my_k, x_fine)
  p = Plots.plot(x_fine, y_fine, label = "True function", legend = :outertop)
  Plots.plot!(x_fine, pred_fine, label = "surrogate", ribbon= 3 * errs, ylims = (-2, 2))
  Plots.scatter!(my_k.x, my_k.y, mc = :black, label = "Data")

plot_258

This interpolates the data but doesn't tell us much about how the signal varies

If we now instead include a noise variance, we get a much more accurate picture of the underlying function

my_k = Kriging(x, y, lb, ub, noise_variance = 0.1)

plot_257

2. Log likelihood function for kriging

There is now a Surrogates.kriging_log_likelihood(x, y, p, theta, noise_variance) function which is differentiable and can be used for hyperparameter optimization.

3. Property-based testing

I have created a new file, test_utils.jl with two functions: _random_surrogate and _check_interpolation. The first generates a random surrogate (random dimension, number of points) of the given type and optionally using a provided function. The second checks that the input surrogate correctly interpolates its input data. Interpolation is an important property to check for most surrogates, and in cases when the surrogate regresses rather than interpolates (kriging with noise_variance not equal to zero, linearSurrogate, SecondOrderPolynomialSurrogate), we can check that that behavior also holds. I have added interpolation-checking tests to all surrogates except for Wendland, GEK, GEKPLS, and Earth. In testing, I found that Wendland doesn't seem to work at all, so that should be fixed.

4. Miscellaneous

I changed Kriging's default p from 2.0 to 1.99 to help numerical stability

Still to do

  • [ ] Add examples for kriging hyperparameter optimization to docs
  • [ ] Add example for kriging noise variance to docs
  • [ ] Fix Wendland surrogate
  • [ ] Many doc examples are broken, would be good to fix those.

archermarx avatar Jul 14 '22 20:07 archermarx

Codecov Report

Merging #379 (28a6c45) into master (6eb339e) will decrease coverage by 0.27%. The diff coverage is 68.29%.

@@            Coverage Diff             @@
##           master     #379      +/-   ##
==========================================
- Coverage   79.18%   78.91%   -0.28%     
==========================================
  Files          16       16              
  Lines        2306     2319      +13     
==========================================
+ Hits         1826     1830       +4     
- Misses        480      489       +9     
Impacted Files Coverage Δ
src/LinearSurrogate.jl 100.00% <ø> (ø)
src/Kriging.jl 87.80% <64.86%> (-6.84%) :arrow_down:
src/Radials.jl 90.99% <100.00%> (+2.70%) :arrow_up:
src/Sampling.jl 100.00% <100.00%> (ø)
src/Optimization.jl 71.97% <0.00%> (-0.28%) :arrow_down:

:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

codecov[bot] avatar Jul 14 '22 20:07 codecov[bot]

@archermarx - One of the tasks you've listed is:

Many doc examples are broken, would be good to fix those.

Is this related to #363? If yes, are you working on this?

I'm studying the issue and want to make sure that we both don't work on the same thing :)

Thanks!

vikram-s-narayan avatar Jul 21 '22 07:07 vikram-s-narayan

I'm not sure yet, I haven't dug too deeply down into the issue, but most of the examples in the documentation display no results.

archermarx avatar Jul 21 '22 17:07 archermarx

I'm not sure yet, I haven't dug too deeply down into the issue, but most of the examples in the documentation display no results.

Okay thanks. I'll work on the documentation examples.

vikram-s-narayan avatar Jul 22 '22 05:07 vikram-s-narayan

Is this still needed?

ChrisRackauckas avatar Sep 22 '23 14:09 ChrisRackauckas