pymc-examples Bayesian copula estimation example notebook

This pull request adds a new example notebook for Bayesian copula estimation. Work done by myself and @ericmjl .

[X] Notebook follows style guide https://docs.pymc.io/en/latest/contributing/jupyter_style.html
[ ] PR description contains a link to the relevant issue: a tracker one for existing notebooks or a proposal one for new notebooks
[ ] remove pymc.* and pymc3.* tags from first cell

Currently works under v3. Not working under v4, possible because LKJCholeskyCov is not refactored for v4 yet (according to @ricardoV94)

Dec 17 '21 20:12 drbenvincent

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Dec 17 '21 20:12 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-12-18T11:42:16Z ----------------------------------------------------------------

I would use the filename also as target to reference the notebook so it's not necessary to open the notebook and look at the source to know what target to use.

I also looked at the raw text and saw you are using an html figure to format the image and caption. You should use https://myst-parser.readthedocs.io/en/latest/syntax/optional.html#markdown-figures. This mimimizes html needed which makes sphinx recognize the image, copy it to the _static folder to make sure it is available in the website or pulls it in if generating pdf docs for example and also allows to use myst (and thus markdown) in the caption, for example referencing terms in the glossary or other notebooks.

drbenvincent commented on 2021-12-19T12:00:32Z ----------------------------------------------------------------

Thanks. Have done these things now.

I also simplified the logos. Rather than 2 separate files in a markdown table, I just have a single image with both logos now and no markdown table. Should avoid any strange table rendering issues.

OriolAbril commented on 2021-12-19T16:45:55Z ----------------------------------------------------------------

The :class: myclass is an example to show extra attributes can be used, we don't need to include it here

Dec 18 '21 11:12 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-12-18T11:42:17Z ----------------------------------------------------------------

I need to add a huge warning to the jupyter style guide about this, but never use html links to refer to other notebooks or to other documentation pages. They much more prone to breaking and can't by design respect the version of the documentation being read, they will always point to latest. The binning uses awkward_binning as target from looking at https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/binning.ipynb, so you can use

{ref}here <awkward_binning>

to generate a link with text here pointing to the binning notebook

drbenvincent commented on 2021-12-19T12:03:43Z ----------------------------------------------------------------

This should hopefully be fixed now.

OriolAbril commented on 2021-12-19T16:48:34Z ----------------------------------------------------------------

The ref won't work like this. reviewnb messed up the formatting, but you need 3 things, the reference type (ref), the text to be shown (here) and the target where the link should point to (awkward_binning). See https://docs.readthedocs.io/en/stable/guides/cross-referencing-with-sphinx.html#the-ref-role for more details

Dec 18 '21 11:12 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-12-18T11:42:18Z ----------------------------------------------------------------

Line #6.    np.random.seed(seed=42)

this should use rng = np.random.default_rng(42) plus pass the rng when needed to scipy.stats functions. There are two options for this, the base class takes a seed argument and the rvs method takes a random_state argument, both of which are compatible with passing the rng generator to it. Reference: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html#scipy.stats.rv_continuous and https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.rvs.html#scipy.stats.rv_continuous.rvs.

From numpy docs: https://numpy.org/doc/stable/reference/random/legacy.html#legacy

This class should only be used if it is essential to have randoms that are identical to what would have been produced by previous versions of NumPy.

which I don't think is the case here.

drbenvincent commented on 2021-12-19T12:11:24Z ----------------------------------------------------------------

Thanks, I didn't know SciPy distributions took rng , but now I do :) Fixed

Dec 18 '21 11:12 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-12-18T11:42:18Z ----------------------------------------------------------------

Is something in the computation of data variable non-deterministic? would we want to perform the computation preserving the chain and draw dimensions and averaging only at the end?

drbenvincent commented on 2021-12-19T12:17:38Z ----------------------------------------------------------------

I'm not sure I follow this question. Is this about lines 14, 15, 16? That computation could be done before, outside of the model as a_mu, a_sigma , b_sigma are basically observed variables. But I'm not sure that's what you are getting at.

ericmjl commented on 2021-12-19T15:18:32Z ----------------------------------------------------------------

Some notes from talking with Oriol:

The entire block of deterministics can be moved out of the model context.
We'll probably want to explain why we're using point estimates.

Dec 18 '21 11:12 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-12-18T11:42:19Z ----------------------------------------------------------------

how does this differ from the cov variable in the posterior?

drbenvincent commented on 2021-12-19T12:42:52Z ----------------------------------------------------------------

As far as I can tell, we could use copula_idata.posterior.cov rather than calculate ppc samples.

Doing this results in copula_idata.posterior.cov.stack(sample=("chain", "draw")).data.shape being 2x2x4000. So we then need to iterate over this last sample dimension rather than doing dists = [multivariate_normal([0, 0], cov) for cov in ppc["cov"]]

But it's not immediately obvious to me how to iterate over the sample dimension of an xarray object. Any suggestions?

ericmjl commented on 2021-12-19T15:22:57Z ----------------------------------------------------------------

Possibly a transpose. (Just leaving a note from calling Oriol.)

ericmjl commented on 2021-12-20T00:37:51Z ----------------------------------------------------------------

@drbenvincent I think a transpose is what we can do. For an array that is of shape (2, 2, 4000), I think a transpose using np.transpose(cov, axes=(2, 0, 1)) should do the trick. I think that syntax will take the position 2 dimension and move it to position 0.

OriolAbril commented on 2021-12-20T00:48:39Z ----------------------------------------------------------------

I did some xarray+arviz black magic and updated all the calculations to use the full posterior, which does seem to give the same result. Feel free to use all or parts of that. If doing the transposition here however, do it with xarray and take the values after that. You can use copula_idata.posterior.cov.stack(sample=("chain", "draw")).transpose("sample", ...) (the ... being an actual ellipsis) to ensure the sample dimension is the first and the others keep the same order

Dec 18 '21 11:12 review-notebook-app[bot]

Thanks. Have done these things now.

I also simplified the logos. Rather than 2 separate files in a markdown table, I just have a single image with both logos now and no markdown table. Should avoid any strange table rendering issues.

pymc-examples pymc-examples copied to clipboard

Bayesian copula estimation example notebook

pymc-examples
pymc-examples copied to clipboard