pymc-examples icon indicating copy to clipboard operation
pymc-examples copied to clipboard

WIP ZeroSumNormal example notebook

Open drbenvincent opened this issue 2 years ago • 50 comments

Here is the first draft of a notebook which showcases a new PyMC distribution, ZeroSumNormal created by @aseyboldt.

A few things to note:

  • ~this runs under v4 and it imports ZeroSumNormal from a module ZeroSumNormal.py. The idea is that after this pull request https://github.com/pymc-devs/pymc3/pull/4776 is done we can get rid of this import and just call pm. ZeroSumNormal.~
  • this runs under v3 and it imports ZeroSumNormal from a module ZeroSumNormal.py. The idea is that after this pull request https://github.com/pymc-devs/pymc3/pull/4776 is done we can get rid of this import and just call pm. ZeroSumNormal.
  • the end of the notebook is purposefully vague as I believe @aseyboldt has some suggestions to make.
  • Adrian: I've tried to make sure credit and attribution is very clear but do let me know if you'd like anything changed.

To be clear, I'm not anticipating this pull request to be merged until https://github.com/pymc-devs/pymc3/pull/4776 is done and the notebook can call pm.ZeroSumNormal natively.

Open to comments and feedback.

TODO

  • [x] Revise introduction. Greater emphasis on categorical coding and differences in the Frequentist and Bayesian approach
  • [ ] What is the intuition behind ZeroSumNormal doing better than the manual sum to zero constraint?
  • [ ] Add 2x3 ANOVA style example
  • [ ] (maybe) Add multinomial/ordinal regression example
  • [ ] "How does ZeroSumNormal differ from the manually implementation in a way that it solves the problems above? Does this differ mathematically from the previous model?" from @tomicapretto
  • [ ] "I'm curious what this does different than the manual approach above that leads to better convergence?" from @twiecki

drbenvincent avatar Aug 16 '21 17:08 drbenvincent

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

View / edit / reply to this conversation on ReviewNB

mjhajharia commented on 2021-08-18T00:14:14Z ----------------------------------------------------------------

this is great!! I also really like the lucid writing style in the notebook :))


drbenvincent commented on 2021-08-18T13:51:51Z ----------------------------------------------------------------

Thanks

View / edit / reply to this conversation on ReviewNB

mjhajharia commented on 2021-08-18T00:14:15Z ----------------------------------------------------------------

theano: not installed we wouldn't really want it in v4 right? (might be wrong, just curious)


drbenvincent commented on 2021-08-19T08:32:26Z ----------------------------------------------------------------

Have changed to the latest ZeroSumNormal code now (which is currently only compatible with v3. So I've left in Theano for now

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2021-08-18T10:38:05Z ----------------------------------------------------------------

deflectsions typo


drbenvincent commented on 2021-08-19T08:28:55Z ----------------------------------------------------------------

fixed

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2021-08-18T10:38:06Z ----------------------------------------------------------------

god -> good


View / edit / reply to this conversation on ReviewNB

twiecki commented on 2021-08-18T10:38:06Z ----------------------------------------------------------------

Why does this look so terrible?


drbenvincent commented on 2021-08-18T10:48:04Z ----------------------------------------------------------------

We're at the limits of floating point precision? I'm not sure this plot is strictly required. I could switch it out for an is_close assertion. Speaking to Adrian later today, so will discuss.

drbenvincent commented on 2021-08-19T08:30:55Z ----------------------------------------------------------------

Have removed this and replaced with np.allclose(trace_centered.posterior.β.stack(sample=("chain", "draw")).sum("groups").values, 0.0)

OriolAbril commented on 2021-08-23T12:24:57Z ----------------------------------------------------------------

I think np.allclose can take dataarrays directly and that it doesn't care if the input is 1d or multidimensional so I believe np.allclose(trace_centered.posterior.β.sum("groups"), 0.0) will work

drbenvincent commented on 2021-09-05T10:55:42Z ----------------------------------------------------------------

Yes - have switched to that. It will appear in an upcoming commit.

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2021-08-18T10:38:07Z ----------------------------------------------------------------

I'm curious what this does different than the manual approach above that leads to better convergence?


drbenvincent commented on 2021-08-19T08:31:40Z ----------------------------------------------------------------

Might have to refer to Adrian for an answer. Have added this question to a list of points to address in updates

We're at the limits of floating point precision? I'm not sure this plot is strictly required. I could switch it out for an is_close assertion. Speaking to Adrian later today, so will discuss.


View entire conversation on ReviewNB

drbenvincent avatar Aug 18 '21 10:08 drbenvincent

Thanks


View entire conversation on ReviewNB

drbenvincent avatar Aug 18 '21 13:08 drbenvincent

Thanks for the feedback so far 🙏🏻 Have addressed typos, switched to using the latest ZeroSumNormal code (which only works under v3 at the moment), and introduced a random seed for the sampling for reproducibility.

No more small scale comments needed for the moment - am working on some major updates after a discussion with @aseyboldt. Will shout when done 🙂

drbenvincent avatar Aug 18 '21 18:08 drbenvincent

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-18T19:43:09Z ----------------------------------------------------------------

here would it be possible to use coords and dims instead of shape ? Something like

with pm.Model(coords={'obs_id': np.arange(3)}) as naive_model:

and

   β = pm.Normal("β", 0, 10, dims='obs_id')

it's not a major change but the best practice would be to use coords and dims wherever possible.


drbenvincent commented on 2021-08-19T08:29:33Z ----------------------------------------------------------------

Good point. Did it in the first model, but then lapsed from best practice. Now fixed.

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-18T19:43:10Z ----------------------------------------------------------------

same comment about using coords and dims instead of shape as above.


drbenvincent commented on 2021-08-19T08:30:03Z ----------------------------------------------------------------

done

fixed


View entire conversation on ReviewNB

drbenvincent avatar Aug 19 '21 08:08 drbenvincent

Good point. Did it in the first model, but then lapsed from best practice. Now fixed.


View entire conversation on ReviewNB

drbenvincent avatar Aug 19 '21 08:08 drbenvincent

done


View entire conversation on ReviewNB

drbenvincent avatar Aug 19 '21 08:08 drbenvincent

Have removed this and replaced with np.allclose(trace_centered.posterior.β.stack(sample=("chain", "draw")).sum("groups").values, 0.0)


View entire conversation on ReviewNB

drbenvincent avatar Aug 19 '21 08:08 drbenvincent

Might have to refer to Adrian for an answer. Have added this question to a list of points to address in updates


View entire conversation on ReviewNB

drbenvincent avatar Aug 19 '21 08:08 drbenvincent

Have changed to the latest ZeroSumNormal code now (which is currently only compatible with v3. So I've left in Theano for now


View entire conversation on ReviewNB

drbenvincent avatar Aug 19 '21 08:08 drbenvincent

I think np.allclose can take dataarrays directly and that it doesn't care if the input is 1d or multidimensional so I believe np.allclose(trace_centered.posterior.β.sum("groups"), 0.0) will work


View entire conversation on ReviewNB

OriolAbril avatar Aug 23 '21 12:08 OriolAbril

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-23T12:25:05Z ----------------------------------------------------------------

I find the notation a bit confusing. We are using beta_0 as the intercept here, but then we are using intercept in the code and beta_0 corresponds to the beta_1 here. We should stick to something and always use the same in both math and code. I personally prefer using intercept even in the formula instead of having the 0 index be "different" and excluded from the zerosumnormal. Between 0 or 1 indexing for betas I don't have any preferences though, I think having the "all betas sum to 0" is clear enough on its own, and coordinate values can be anything so there is no issue with math/code compatibility either.


drbenvincent commented on 2021-09-12T15:14:11Z ----------------------------------------------------------------

Things are a bit different now with the new introduction. For that, I've kept the beta notation because it seems to work. For the intercept + deflections model, I've switched to intercept and delta parameters. Hopefully this works.

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-23T12:25:06Z ----------------------------------------------------------------

All variables are named beta here. I remember this bug being fixed already, can you try latest ArviZ and see if it's enough? Otherwise I'd use the development version


drbenvincent commented on 2021-09-12T15:14:34Z ----------------------------------------------------------------

This is labelling properly now.

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-23T12:25:07Z ----------------------------------------------------------------

I love this plot! Very minor nit is I would label the arrow corresponding to the intercept.


drbenvincent commented on 2021-09-12T15:14:48Z ----------------------------------------------------------------

Done

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-23T12:25:07Z ----------------------------------------------------------------

I have mixed feelings about the bullet points which extend a bit to the pair and posterior plots. First and foremost, the model is a complete failure because it doesn't converge (and because it won't ever converge). I don't see correlations by themselves as a failure too though, but as the main reason for this lack of convergence. And given the model is not converging, we can't trust its results, so it is expected for parameter recovery to fail, but it might not always be the case. We could have a model with no mixing and therefore no convergence but that by chance or because the initial points turned out to be close to the true parameters, it gets stuck there or very close and does "recover" the true parameters. Both models are as much a failure though even if this second hypothetical one "recovers" the parameters.

I think the main factor in deciding how to address this is defining the target audience for this notebook. Do we already expect readers to know about MCMC sampling and convergence?


drbenvincent commented on 2021-09-12T15:15:58Z ----------------------------------------------------------------

I've change the wording around this model now. I think it makes more sense now in the context of the new introduction.

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-23T12:25:08Z ----------------------------------------------------------------

Line #9.        β = pm.Deterministic("β", _β - m)

I would be explicit and set dims="groups" here too even though it's not needed to set the shape, I think it's valuable info. This will have no effect on any of the later plots as of now, but it would have if we use 1 based indexing for betas when matching math/code notation

I also don't have any suggestion so this feels a bit empty, but it would be great to try and find some more descriptive names for the _b and _intercept variables, something that gives some insight on the differences between these variables and the constrained ones.


drbenvincent commented on 2021-09-12T15:24:41Z ----------------------------------------------------------------

added the dims="groups"

I've used the _unconstrained suffix. It's more verbose, but maybe clearer.

Yes - have switched to that. It will appear in an upcoming commit.


View entire conversation on ReviewNB

drbenvincent avatar Sep 05 '21 10:09 drbenvincent

Things are a bit different now with the new introduction. For that, I've kept the beta notation because it seems to work. For the intercept + deflections model, I've switched to intercept and delta parameters. Hopefully this works.


View entire conversation on ReviewNB

drbenvincent avatar Sep 12 '21 15:09 drbenvincent

This is labelling properly now.


View entire conversation on ReviewNB

drbenvincent avatar Sep 12 '21 15:09 drbenvincent

Done


View entire conversation on ReviewNB

drbenvincent avatar Sep 12 '21 15:09 drbenvincent

I've change the wording around this model now. I think it makes more sense now in the context of the new introduction.


View entire conversation on ReviewNB

drbenvincent avatar Sep 12 '21 15:09 drbenvincent

added the dims="groups"

I've used the _unconstrained suffix. It's more verbose, but maybe clearer.


View entire conversation on ReviewNB

drbenvincent avatar Sep 12 '21 15:09 drbenvincent