pymc-examples
pymc-examples copied to clipboard
stochastic volatility
File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/stochastic_volatility.ipynb Reviewers:
Known changes needed
Changes listed in this section should all be done at some point in order to get this notebook to a "Best Practices" state. However, these are probably not enough! Make sure to thoroughly review the notebook and search for other updates.
ArviZ related
- Use
arviz-darkgrid
style - Use
return_inferencedata=True
Notes
Exotic dependencies
None
Computing requirements
Model takes roughly 15 mins to sample
Hi @OriolAbril ! Can I work on this issue?
Assigned :)
thanks!
I tried running it first, and I get a KeyError in cell 12:
KeyError Traceback (most recent call last)
<ipython-input-12-3bac4038cc19> in <module>
1 fig, ax = plt.subplots(figsize=(14, 4))
2
----> 3 y_vals = np.exp(trace["volatility"])[::5].T
4 x_vals = np.vstack([returns.index for _ in y_vals.T]).T.astype(np.datetime64)
5
~/.local/lib/python3.8/site-packages/arviz/data/inference_data.py in __getitem__(self, key)
234 """Get item by key."""
235 if key not in self._groups_all:
--> 236 raise KeyError(key)
237 return getattr(self, key)
238
KeyError: 'volatility'
if trace
should have a 'volatility' key, then I'm not sure how to verify this. could you help me out?
I do have 'volatility' under Data Variables, so I'm wondering if this is an issue of syntax error and there is a different way to access it than trace["volatility"]
?
update: I figured out how to access the xarray variables.
I think using trace.posterior.data_vars['volatility']
works, but then in cell 12 if I run
y_vals = np.exp(trace.posterior.data_vars['volatility'])[::5].T
instead of y_vals = np.exp(trace["volatility"])[::5].T
I encounter an error again, so I am still a little unclear with how to work around xarray. could use some help here.
Hi, sorry about that, the documentation on InferenceData is still in very active development and quite scattered.
I think using trace.posterior.data_vars['volatility'] works
I'd recommend using directly trace.posterior["volatility"]
which should return the same result. I think going over https://docs.pymc.io/notebooks/multilevel_modeling.html will help as you'll see InferenceData in action, and then you need to use the fact that InferenceData groups (i.e. idata.posterior
, idata.sample_stats
...) are xarray Datasets
y_vals = np.exp(trace.posterior.data_vars['volatility'])[::5].T
This touches deeper and more important changes. InferenceData uses label based indexing, not positional indexing, and in addition it doesn't flatten the chain
and draw
dimensions but keeps them separate. You should give a name to volatility_dim_0
(as per the first point in ArviZ section of https://github.com/pymc-devs/pymc-examples/wiki/Notebook-updates-overview) and then you'll be able to do this ::5
subsetting to get one out of five points as idata.posterior["volatility"].sel(dim_name=slice(step=5))
if using integer coordinate values (otherwise use ìsel
). I have a blogpost on arviz-pymc interaction and about cool things one can do with named dims and coords: https://oriolabril.github.io/oriol_unraveled/python/arviz/pymc3/xarray/2020/09/22/pymc3-arviz.html. You may also want to combine chain
and draw
dims as shown in https://arviz-devs.github.io/arviz/getting_started/WorkingWithInferenceData.html#combine-chains-and-draws
this really helps, thanks a lot!
I'm not sure how to choose one in five points using sel
. what I have now is
y = trace.posterior.rename_dims({'volatility_dim_0':'vol'}).stack(pooled_chain=("chain","draw")['volatilty']
which has a dimension of (2905, 8000)
. If I want one in 5 points the resulting dimensions for y_vals
should be (2905,1600)
, however I'm not able to get that if I apply sel(vol=slice(5))
on y
here and applying sel(pooled_chain=slice(5))
on y
throws an error.
I think you'll want to do .isel(pooled_chain=slice(step=5))
, after stacking, the labels are like tuples so we'll want to use positional indexing. Note also that doing slice(step=5)
is completely different from slice(5)
which is equivalent to slice(stop=5)
(see https://docs.python.org/3.9/library/functions.html#slice).
Now that you are already discussing specific changes, can you open a PR? Even if the code doesn't run, it will be easier to discuss the changes and go over the feedback, we'll all see the cell and have the comments attached there, so there will be no need for sharing cell numbers or screenshots, we'll also have a better context of what exactly are we slicing for without needing to open the notebook in a different window/tab.
right, thanks! very silly of me to mistake how to use slice
in python, I assumed it was some different functionality when used in xarray, don't know why!
Yup, will open a PR. a seems more logical than discussing like this.
very silly of me to mistake how to use slice in python, I assumed it was some different functionality when used in xarray, don't know why!
No need to apologize! We all make mistakes :), and I think all of us were somewhere between very confused and completely lost when starting with xarray. We even created https://github.com/arviz-devs/xarray_examples to help ourselves with specific xarray questions.
looks like a very useful tracker/resource to me :smile: