AbstractGPs.jl icon indicating copy to clipboard operation
AbstractGPs.jl copied to clipboard

Test VFE with the naive implementation

Open sharanry opened this issue 2 years ago • 6 comments

Summary

We currently lack any test to confirm the predictive distribution is matching what is prescribed by the original papers -

  • VFE: M. K. Titsias. "Variational learning of inducing variables in sparse Gaussian processes". In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics. 2009.
  • DTC: M. Seeger, C. K. I. Williams and N. D. Lawrence. "Fast Forward Selection to Speed Up Sparse Gaussian Process Regression". In: Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics. 2003

This is an attempt at fixing that.

The predictive distribution for both DTC and VFE should be the same -- projected process (PP)? This PR is currently checking it against DTC's predictive distribution defined in Quinonero-Candela's Eq 20b

sharanry avatar Mar 31 '22 10:03 sharanry

With these tests, we can identify a subjectively small but consistent discrepancy between my computationally naive test implementation and the package's implementation. Not sure if it is a bug in the test.

julia> inv(LinearAlgebra.Diagonal(1e-12 * ones(5))) *
                kernelmatrix(k, x_test, u) *
                Σ(x, u) *
                kernelmatrix(k, u, x) *
                y
5-element Vector{Float64}:
  2.2822711905989306
  1.619419315942234
  2.4886061448883043
  0.13716866592873223
 -1.4776904851124237

julia> mean(f_approx_post, x_test)
5-element Vector{Float64}:
  2.2632581110918366
  1.620565927921467
  2.540609342124296
  0.16377806765055059
 -1.6899264788845625

julia> kernelmatrix(k, x_test, x_test) - q(x_test, x_test) +
                 kernelmatrix(k, x_test, u) * Σ(x, u) * transpose(kernelmatrix(k, x_test, u))          
5×5 Matrix{Float64}:
  0.148276     0.0509091   -0.00913296  -0.045995    0.0149131
  0.0506096    0.0490361   -0.00308661  -0.0382136   0.0114484
 -0.00913856  -0.00307898   0.0389262    0.0023531  -0.000731468
 -0.046091    -0.038105     0.00236565   0.131079   -0.058772
  0.0148782    0.0114382   -0.00071384  -0.0587383   0.308568

julia> cov(f_approx_post, x_test)
5×5 Matrix{Float64}:
  0.150214     0.0511708   -0.00932485   -0.0470627    0.0133652
  0.0511708    0.0493259   -0.00309447   -0.0388128    0.0106505
 -0.00932485  -0.00309447   0.0389497     0.00241128  -0.000594857
 -0.0470627   -0.0388128    0.00241128    0.132985    -0.0552457
  0.0133652    0.0106505   -0.000594857  -0.0552457    0.300714

sharanry avatar Mar 31 '22 10:03 sharanry

With these tests, we can identify a subjectively small but consistent discrepancy between my computationally naive test implementation and the package's implementation. Not sure if it is a bug in the test.

I'm pretty sure that your mean calculation doesn't take the fact that the prior has non-zero mean into account. I think you need something like

    @test map(sin, x_test) + inv(LinearAlgebra.Diagonal(1e-12 * ones(5))) *
          kernelmatrix(k, x_test, u) *
          Σ(x, u) *
          kernelmatrix(k, u, x) *
          (y - map(sin, x)) ≈ mean(f_approx_post, x_test)

instead.

As regards the covariance, note that the predictive covariance at locations other than the pseudo-inputs are different between the VFE and DTC. Could you check that your expressions agree with what the package currently does if you check that x_test = z. I think they should also agree if you swap out VFE for DTC in the tests.

I would advise checking that what you've implemented for the covariance in your tests lines up with equation 6 in Titsias' 2009 paper -- a quick glance on my part suggests that they're not quite the same, but I've not checked it in detail.

willtebbutt avatar Mar 31 '22 10:03 willtebbutt

@willtebbutt Also, are derivations for the current VFE/DTC predictive distribution internal code available somewhere? I still have your write-up which you gave me couple of years back while implementing these but that document don't seem to have any derivations.

sharanry avatar Mar 31 '22 10:03 sharanry

Hmmm I actually don't think that we do have that lying around. Could you open an issue about it so that we don't forget it?

willtebbutt avatar Mar 31 '22 10:03 willtebbutt

Hmmm I actually don't think that we do have that lying around. Could you open an issue about it so that we don't forget it?

Oh okay. Please let me know if you come across them. I was hoping to use those as a reference to implement other sparse techniques like FITC, etc.

sharanry avatar Mar 31 '22 11:03 sharanry

In the new tests, could you separate out the terms, call it e.g. dtc_posterior_mean = ... # see (ref) (eq. X)? that'd be really helpful to make it easier to follow :)

st-- avatar Apr 01 '22 11:04 st--

This appears to have gone stale. @sharanry please feel free to re-open if you wish to finish off.

willtebbutt avatar Sep 15 '23 19:09 willtebbutt