pymc-examples icon indicating copy to clipboard operation
pymc-examples copied to clipboard

Add HSGP reference and example nb

Open bwengals opened this issue 1 year ago • 1 comments

HSGP reference and examples

Corresponding to issue #646. WIP as there are still some kinks and bugs to work out. Particularly on prediction in example 3.

  • [ ] Notebook follows style guide https://docs.pymc.io/en/latest/contributing/jupyter_style.html
  • [ ] PR description contains a link to the relevant issue:
    • a tracker one for existing notebooks (tracker issues have the "tracker id" label)
    • or a proposal one for new notebooks
  • [ ] Check the notebook is not excluded from any pre-commit check: https://github.com/pymc-devs/pymc-examples/blob/main/.pre-commit-config.yaml

Helpful links

  • https://github.com/pymc-devs/pymc-examples/blob/main/CONTRIBUTING.md

bwengals avatar Mar 12 '24 01:03 bwengals

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

I've started a review, that I'll push directly to your branch when ready

AlexAndorra avatar Mar 13 '24 16:03 AlexAndorra

thanks @AlexAndorra! Also very open to splitting this up -- it is pretty long, sorry. That might make it easier to review.

bwengals avatar Mar 13 '24 19:03 bwengals

Would be cool to have approx_params available as a utility function.

fonnesbeck avatar Mar 18 '24 12:03 fonnesbeck

Perhaps emphasize in the text what you are looking for in the gram plots (the bright yellow diagonal)?

fonnesbeck avatar Mar 18 '24 12:03 fonnesbeck

Would it be worth having an alias for _m_star called something like num_basis_vectors as a convenience?

fonnesbeck avatar Mar 18 '24 12:03 fonnesbeck

Perhaps emphasize in the text what you are looking for in the gram plots (the bright yellow diagonal)?

Added this text:

The plots above compare the approximate Gram matrices to the unapproximated Gram matrix in the top left panel. The goal is to compare the approximated Gram matrices to the true one (upper left). Qualitatively, the more similar they look the better the approximation. Also, these results are only relevant to the context of the particular domain defined by X and the chosen lengthscale, $\ell = 3$ -- just because it looks good for $\ell = 3$ doesn't mean it will look good for, for instance, $\ell = 10$.

Would be cool to have approx_params available as a utility function.

Yah I agree! it'd clean the nb up a bit too. Any thoughts on a name? I really don't like how vague "approx_params" is but couldn't think of anything particularly better.

Would it be worth having an alias for _m_star called something like num_basis_vectors as a convenience?

Definitely, especially now I think it's a good idea that users access it directly sometimes.

bwengals avatar Mar 18 '24 22:03 bwengals

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:20Z ----------------------------------------------------------------

In https://github.com/pymc-devs/pymc-examples/pull/627 I added the BibTex references for the main articles, you you can use them here ;)

Also, it might be nice to add reference links to the classes in thee PyMC docs (also see https://github.com/pymc-devs/pymc-examples/pull/627)


View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:21Z ----------------------------------------------------------------

https://github.com/arviz-devs/preliz is a nice package to do these plots ;)


View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:22Z ----------------------------------------------------------------

It would be nice to add the true values via ref_val ( and remove overlapping titles XD)


AlexAndorra commented on 2024-05-21T15:37:59Z ----------------------------------------------------------------

Done

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:23Z ----------------------------------------------------------------

What about adding the HDO of the likelihood? In many applications one is interested in both, the GP and the whole fit. The reason I bring this in is because in many PyMC examples this is not done, so this could be a good place to have it ;)


bwengals commented on 2024-03-29T00:43:23Z ----------------------------------------------------------------

What's HDO stand for (though I think I know what you mean)? Yeah, that's a good idea

juanitorduz commented on 2024-03-29T09:35:50Z ----------------------------------------------------------------

HDI 🤪

AlexAndorra commented on 2024-05-21T15:58:16Z ----------------------------------------------------------------

Done

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:24Z ----------------------------------------------------------------

"Additive processes result in a product of the power spectral densities." I think I know what you mean by this statement because I have read the papers carefully, but maybe you want to explain a bit more?


bwengals commented on 2024-03-29T00:47:21Z ----------------------------------------------------------------

you're right this isn't well written. How about:

Instead of constructing and then directly adding them, the sum of two HSGPs can be computed more efficiently by first taking the product of their power spectral densities, and then creating a single GP from the combined power spectral density. This reduces the number of unknown parameters because the two GPs can share the same basis set.

Still feels like a mouthful, but lmk what you think. I don't want to explain what a PSD here, or any theory really, so maybe this is a good place to cite something too.

juanitorduz commented on 2024-03-29T09:38:37Z ----------------------------------------------------------------

I think this version is clearer. You can think about adding this into a little box so that users see it as a side remark

{admonition} Adding HGSPs

Instead of constructing and then directly adding them, the sum of two HSGPs can be computed more efficiently by first taking the product of their power spectral densities, and then creating a single GP from the combined power spectral density. This reduces the number of unknown parameters because the two GPs can share the same basis set.

See https://jupyterbook.org/en/stable/content/content-blocks.html

AlexAndorra commented on 2024-05-21T16:46:11Z ----------------------------------------------------------------

Done

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:25Z ----------------------------------------------------------------

Line #18.    m_options = [3, 3, 5]

Can you add a legend for m? At the moment It is not clear which curve has a larger m.


bwengals commented on 2024-03-29T00:50:58Z ----------------------------------------------------------------

hmm, maybe I could say something in text? A legend for each panel clutters things quite a bit. The first panel has 3 basis functions, the middle has 3, and the rightmost has 5.

This actually doesn't line up with the text, which just says there are 3 basis functions per panel. I'll change it to 3 for all panels.

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:25Z ----------------------------------------------------------------

I think (If I have not missed something) this is the first time you talk about "first eigenvector". Maybe is worth commenting above that the basis functions come as eigenvectors of the Dirichle's Laplacian?


bwengals commented on 2024-03-29T00:54:20Z ----------------------------------------------------------------

Oh good point. What if restrict myself to saying just "basis vector"? I feel like saying eigenvectors of Dirichlet's Laplacian will only be meaningful to some people. And if you wanna actually know what that means the papers are the source for that info.

juanitorduz commented on 2024-03-29T09:39:50Z ----------------------------------------------------------------

basis vector seems good!

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:26Z ----------------------------------------------------------------

geek nitpick: do not use $\Delta$ as might be confused with the Laplacian in this context.


bwengals commented on 2024-03-29T00:56:23Z ----------------------------------------------------------------

I would agree... But, I'm using the notation from the "Practical HSGP" paper, so blame them

juanitorduz commented on 2024-03-29T09:40:37Z ----------------------------------------------------------------

ahaha ok XD

https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExNXRneGZwNGRrMXZwMGh4Mjd3MHVhY3lndnA5aDA3b2FhN2lsNHg0ciZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/fqtyYcXoDV0X6ss8Mf/giphy.gif

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:27Z ----------------------------------------------------------------

I get the point but to be honest, to the naked eye all these plots look very similar and are hard to use as a way of validating the approximation. Is there a numerical metric to compare these matrices? maybe a simple matrix norm?


AlexAndorra commented on 2024-03-26T15:41:51Z ----------------------------------------------------------------

What do you mean by "a lower fidelity HSGP approximation" here?

bwengals commented on 2024-03-29T01:03:34Z ----------------------------------------------------------------

You're right @juanitorduz, the differences don't jump out, except for maybe the top right. You can also see it in the upper left and bottom right blocks on the bottom row. I could lower m and c a touch to make them worse, or maybe add better explanation in the text?

I agree that matrix norm makes sense, but I don't want to propose any kind of approach here... since I have no idea what norm would be a good cutoff. And I also don't think this is a widely useful approach anyway because maybe K is too big to calculate directly.

The goal was more to merely get a feel for whats going on that makes a bad HSGP approximation bad. And so that would show how someone might start doing some detective work to see if the approximation would be good enough for their particular use case.

@AlexAndorra, I was meaning like, just low fidelty = bad approximation. The GP approximation is too "compressed" in a sense and so it cant reproduce the true GP.

juanitorduz commented on 2024-03-29T09:43:23Z ----------------------------------------------------------------

I think both suggestions would improve this :

- lower m and c a touch to make them worse

- Add explanation in the text where to focus concretely

AlexAndorra commented on 2024-05-21T19:35:33Z ----------------------------------------------------------------

Done

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:28Z ----------------------------------------------------------------

Maybe comment a bit the meaning of the two regions?


AlexAndorra commented on 2024-03-26T19:44:15Z ----------------------------------------------------------------

I think it's just that the test set in the region where x1 > 2 and x1 < 4

bwengals commented on 2024-03-29T01:05:42Z ----------------------------------------------------------------

Yup, what Alex said. This is written nowhere though so I'll add some text for this

AlexAndorra commented on 2024-05-21T15:30:04Z ----------------------------------------------------------------

Done

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:29Z ----------------------------------------------------------------

I suggest adding an into sentence as: Here is the model structure. Below we describe its main components ...


AlexAndorra commented on 2024-05-21T15:30:15Z ----------------------------------------------------------------

Done

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:30Z ----------------------------------------------------------------

fix overlapping titles


AlexAndorra commented on 2024-03-28T15:14:39Z ----------------------------------------------------------------

Done on my version, I'll push it to Bill's branch ASAP

bwengals commented on 2024-03-29T01:07:35Z ----------------------------------------------------------------

@juanitorduz, how do you do that with arviz? It drives me crazy... is there a trick or is it super manual?

juanitorduz commented on 2024-03-29T09:45:47Z ----------------------------------------------------------------

Yes: I always use this code to plot traces / rank plots:

python

axes = az.plot_trace(

data=idata_1,

var_names=var_names,

compact=True,

kind="rank_bars",

backend_kwargs={"figsize": (15, 17), "layout": "constrained"},

)

plt.gcf().suptitle("Trace - Model 1", fontsize=16)

AlexAndorra commented on 2024-05-21T15:32:52Z ----------------------------------------------------------------

Done

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:31Z ----------------------------------------------------------------

Is there any metric to evaluate the fits which is not purely visual? I have not worked much with 2-dim GPs

AlexAndorra commented on 2024-03-28T15:15:14Z ----------------------------------------------------------------

These plots are absolutely beautiful!!

bwengals commented on 2024-03-29T01:09:34Z ----------------------------------------------------------------

thanks Alex! the qmc thing is what made them actually look nice. I think I should probably use a diverging colormap that's white at zero...

RE metrics, I dont know of anything specific to GPs. Any lack of fit metric?

juanitorduz commented on 2024-03-29T09:47:50Z ----------------------------------------------------------------

This was more of a question (outside the scope of the notebook) XD

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:31Z ----------------------------------------------------------------

add title anx axis labesl ;)


bwengals commented on 2024-03-29T01:11:16Z ----------------------------------------------------------------

good call, i'll add these.

Also, this figure is rendering absolutely massive in reviewnb, I hope it doesn't look like this in the docs. Do you know of recommendations for figure sizes in pymc-examples?

juanitorduz commented on 2024-03-29T09:49:31Z ----------------------------------------------------------------

I usually create figures as

s = pm.draw(pm.Lognormal.dist(mu=0.5, sigma=0.5), draws=10_000)

fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(s, np.linspace(0, 5, 100));

juanitorduz commented on 2024-03-29T09:49:58Z ----------------------------------------------------------------

This way you can specify the sizes directly

AlexAndorra commented on 2024-03-29T14:59:27Z ----------------------------------------------------------------

I handled the size issue in the version I'm working on @Bill

AlexAndorra commented on 2024-03-29T15:01:02Z ----------------------------------------------------------------

That being said, I don't think this particular plot is useful, I'd remove it. Which point are you tring to convey?

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:32Z ----------------------------------------------------------------

maybe we use a dicrete color palette as the blues intensity does not have a meaning?


bwengals commented on 2024-03-29T01:12:58Z ----------------------------------------------------------------

I did originally, but changed it to blues so it was comparable to the kronecker example.

Still makes sense to change it though, but in case you hadnt gotten to the kronecker part yet, what do you think in that context?

juanitorduz commented on 2024-03-29T09:51:00Z ----------------------------------------------------------------

Right! after the  kronecker part this makes sense. No strong opinion here. Both work together.

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:33Z ----------------------------------------------------------------

I think you never introduced what a "spectral density" is. so maibe is worth add a comment at the beginning?


AlexAndorra commented on 2024-03-29T17:17:11Z ----------------------------------------------------------------

+1

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:34Z ----------------------------------------------------------------

why are the pymc spectral densities not vectorized?


juanitorduz commented on 2024-03-25T20:55:32Z ----------------------------------------------------------------

Can you add a reference to equation (3) in [Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming](https://link.springer.com/article/10.1007/s11222-022-10167-2)

bwengals commented on 2024-03-29T01:15:35Z ----------------------------------------------------------------

In PyMC they are vectorized over input dimension for a single GP with different lengthscales for each dimension. This one is vectorized over lengthscale, but assumes input_dim = 1 and each GP has a different lengthscale.

Tried and couldnt figure out how to do both in the PyMC version... I bet it's possible though

juanitorduz commented on 2024-03-29T09:51:42Z ----------------------------------------------------------------

ah ok! thanks for the clarification :)

AlexAndorra commented on 2024-03-29T17:18:23Z ----------------------------------------------------------------

Can you copy the PyMC implementation and spot out the changes you made, to make them more explicit? I think it'd help understanding, and reproducibility for other kernels

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:35Z ----------------------------------------------------------------

This is a fantastic example!


View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:36Z ----------------------------------------------------------------

fix overlaping titles


AlexAndorra commented on 2024-03-29T20:25:06Z ----------------------------------------------------------------

Done

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:37Z ----------------------------------------------------------------

Here the dicrete palette would be easier to read so that we can identify each of the 10 GPs


AlexAndorra commented on 2024-03-29T20:55:16Z ----------------------------------------------------------------

Done

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:38Z ----------------------------------------------------------------

I need more time to diggest this example!


AlexAndorra commented on 2024-03-31T22:18:48Z ----------------------------------------------------------------

Do you think we can / need to detail the idea of Kronecker structure a bit more @Bill?

I feel like this version goes a bit too fast

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-03-25T20:22:38Z ----------------------------------------------------------------

overlapping titles


AlexAndorra commented on 2024-04-05T18:33:55Z ----------------------------------------------------------------

Fixed

Can you add a reference to equation (3) in [Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming](https://link.springer.com/article/10.1007/s11222-022-10167-2)


View entire conversation on ReviewNB

juanitorduz avatar Mar 25 '24 20:03 juanitorduz