arviz
arviz copied to clipboard
Write "Working with InferenceData" page
Tell us about it
As part of the documentation revamp described in #1331. We should add a page explaining how to work with InferenceData
and show several common tasks. The guide should cover at least the following topics:
#1506
- [x] Stacking dimensions (maybe even reshaping? reshaping is a bit crazy though), fancy chain/draw extraction methods from #1469
- [x] Some grouping/aggregating of data: mean
- [x] Get values and coordinates as arrays
- [x] Slice inferenceData objects
Pending
- [ ] modifying, transforming and/or creating a variable within an InferenceData group or groups. See #641 for code examples that can be used for this
- [ ] how to combine/merge/extend multiple InferenceData objects
- [ ] Coordinate modification. Maybe something like #1461 (includes example code and explanation)
- [ ] ...
Thoughts on implementation
Feel free to tackle parts of the issue instead of generating the whole section at once.
It is also probably a good idea to add a See also section at the bottom linking to other docs (i.e. pymc3 or pystan docs), blogs and other resources that use inferencedata. For example: https://docs.pymc.io/notebooks/multilevel_modeling.html
can i work on this issue
Great, thanks!
is there any discord server or any community chat or meetings to attain. I am done with the setup. just wanted to know more about the project
We have a gitter chat available: https://gitter.im/arviz-devs/community, and there are also questions and discussions about the project at PyMC discourse: https://discourse.pymc.io/ and to a lesser extent at Stan discourse: https://discourse.mc-stan.org/
- [ ] how to combine/merge/extend multiple InferenceData objects
This sounds good, I recently went through InferenceData docs and they do seem ambiguous and all over the place, the entire issue seems like something that would probably be done over multiple PRs, but I guess to begin with I can take care of this part, I'd like to take it up if noone is working on it at the moment.
Great, note that you have to extend the currently existing notebook at https://github.com/arviz-devs/arviz/blob/main/doc/source/getting_started/WorkingWithInferenceData.ipynb
There are some examples on how to combine inferencedata objects at https://arviz-devs.github.io/arviz/api/generated/arviz.concat.html#arviz.concat.
Using the documentation system outlined in https://documentation.divio.com/, this working with inferencedata page should be a tutorial page (or a bundle of mini tutorials if you prefer), the explanation is in https://github.com/arviz-devs/arviz/blob/main/doc/source/getting_started/XarrayforArviZ.ipynb, and the reference is https://arviz-devs.github.io/arviz/api/inference_data.html. The reference pages may be the easier ones to find right now, so we should probably add some links from reference pages to the working with inferencedata notebook and viceversa, we can add links from the notebook to the reference page of the functions used.
explanation is in https://github.com/arviz-devs/arviz/blob/main/doc/source/getting_started/XarrayforArviZ.ipynb,
this is the explanation for? (sorry for getting back so late in this issue)
Following the documentation system in https://diataxis.fr/ (they just changed the url, but it's the same content I linked to above), docs are not supposed to be all over the place but should be separate at least into 4 different pages. InferenceData
docs are actually close to getting there, but we need to also make sure each page is clear on what is explained in them and has links to the other pages in case the info is not in that page but in another one (this last part is a bit of a disaster right now).
A very quick summary of diataxis (from its own introduction page):
Tutorials | How-to guides | Reference | Explanation | |
---|---|---|---|---|
oriented to | learning | a goal | information | understanding |
must | allow the newcomer to get started | show how to solve a specific problem | describe the machinery | explain |
its form | a lesson | a series of steps | dry description | discursive explanation |
analogy | teaching a small child how to cook | a recipe in a cookery book | a reference encyclopaedia article | an article on culinary social history |
for InferenceData |
Working with InferenceData page (in getting started section) | unclear, maybe some page in the user guide secion, maybe links to case studies in pymc/stan/... docs (i.e. radon or rugby examples) | InferenceData schema | XarrayforArviZ |
the InferenceData schema is dry and is not a good resource to understand InferenceData: why it's useful?, why is it needed?, what are it's main (and super cool) features); but it should not be. It's goal is to describe InferenceData in an encycopedic-like manner.
@OriolAbril got this, um what about az.concat. that works on two inference datasets and we have one, so should i import another one? or create another one from the original one
Depending on what you want to do you can create a fake/synthetic idata or simply use a copy/subset. To show how to extend an inferencedata with more draws, a copy or a subset (i.e. chains 0 and 1 only) is already fine. There is no difference in how to operate with two real and different idata and that.
@OriolAbril Hi! I'm sorry for replying so late, I'm alright mentally and physically now so I'm free to get back to contributing!!! Um I think I'll make a draft PR with very basic changes and whatever I feel might work, and from thereon you can suggest redirections or changes, does that sound good?
Sounds great! :smile:
@OriolAbril I was looking for sample arviz data to use. I realised we could really use a page that describes the qualitative meaning of the sample arviz datasets that we load; else, understanding context WHY we perform inference data operations is tricky. We want to add real-time usage on top of xarrays. let me know if that makes any sense Since We want to add perspective on real-time
I am not sure I understand, but it sounds similar to https://arviz-devs.github.io/arviz/api/generated/arviz.list_datasets.html#arviz.list_datasets, maybe we could generate a page from all the info listed there?
As a general comment, there shouldn't be many context explanations in "Working with InferenceData" page. The goal is to guide users with common idata operations, we use the example data not because of their particular meaning or value, but to skip idata creation and therefore be able to focus only on idata operations.
As a general comment, there shouldn't be many context explanations on the "Working with InferenceData" page. The goal is to guide users with common idata operations. We use the example data, not because of their particular meaning or value, but to skip idata creation and therefore be able to focus only on idata operations.
Got it!! also, thanks for pointing to az.list_datasets() I was looking for this and couldn't find the function; yeah, a page generated from it would make sense. Just adding print(az.list_datasets()) in the source code would be good enough, I guess. And, I'll keep the non-contextual usage function thingy, thanks!
can I work on this=> Coordinate modification. Maybe something like Posterior predictive check (plot_ppc) with dataframe encoding. #1461 (includes example code and explanation)
coordinate modification will be basically from set coordinates right? the function defined here: https://arviz-devs.github.io/arviz/api/generated/arviz.InferenceData.set_coords.html
Thanks @disha4u, that would be great.
set_coords
is to convert existing variables to coordinates. The example should probably use assign_coords
instead like in the example code snippet in the issue linked to add new coordinates from something that isn't already a variable in the dataset. Also note that ArviZ has wrappers for methods of xarray.Dataset
which often lack documentation.
I would recommend going over the linked issue and reproducing the example in plot_ppc locally. Then adapting the example and explanation to the notebook. Do not hesitate to ask questions in this issue, on gitter or opening a draft PR to ask questions there
ok, will do thanks
Hello, I'm new here and a beginner. Can I work on this? or is there any other I can work on?
The pending elements are still work in progress, and nobody is assigned to do it, so feel free to work on that. Or if you have used inferencedata and faced some issues you can also suggest something else to add to the doc.
Working with InferenceData
InferenceData is a versatile container used in probabilistic programming libraries like PyMC3 and PyStan to store and manage the results of Bayesian inference. This page serves as a guide to help you work effectively with InferenceData objects, covering common tasks and techniques.
Stacking Dimensions
InferenceData often contains multidimensional data structures, including chains and draws from Bayesian models. You might need to stack dimensions to facilitate analysis or visualization. You can achieve this using the following methods:
Stacking Chains and Draws
To stack chains or draws, you can use the stack
method available in some probabilistic programming libraries. This reshapes your data into a more manageable format for analysis. For instance, in PyMC3, you can use:
inference_data = pm.sample(...)
stacked_data = inference_data.posterior.stack(chain_draw=("chain", "draw"))
Reshaping Data
Reshaping is a powerful technique to transform InferenceData objects. While it can be complex, it allows you to structure data to suit your needs. However, it might not be necessary for most common use cases.
Grouping and Aggregating Data
You may want to perform operations like calculating the mean of your posterior samples. InferenceData makes it easy to aggregate data across different dimensions:
mean_across_chains = inference_data.posterior.mean(dim="chain")
Accessing Values and Coordinates
You can extract values and coordinates from InferenceData objects, which is useful for further analysis or visualization:
values = inference_data.posterior["parameter_name"].values
coordinates = inference_data.posterior["parameter_name"].coords
Slicing InferenceData Objects
Slicing allows you to focus on specific subsets of your InferenceData object:
subset = inference_data.sel(chain=0, draw=slice(0, 10))
Modifying and Transforming Variables
InferenceData objects are not set in stone. You can modify, transform, or create new variables within them. Refer to code examples in issue #641 for practical guidance on this.
Combining and Merging InferenceData Objects
If you have multiple InferenceData objects from different sources or runs, you can combine, merge, or extend them to consolidate your results. The specific method may vary depending on the library you are using.
Coordinate Modification
Coordinate modification allows you to manipulate the structure of your InferenceData objects to fit specific analysis needs. For example, you can perform posterior predictive checks and encode the results into dataframes. Refer to issue #1461 for detailed examples and explanations.
See Also
For further information on working with InferenceData objects, you can explore other resources such as:
These resources provide additional insights and examples to help you make the most of InferenceData in your Bayesian inference workflows.
Hi @OriolAbril , to work on this issue I need to extend the WorkingWithInferenceData
IPython notebook right? I'm planning to work on adding the modifying, transforming and/or creating a variable within an InferenceData group or groups
part.
Sorry for the slow reply, just now checking to see the current state, it looks like most of the basic things we aimed to get there are already there. For example, creating a variable from other variables is there: https://python.arviz.org/en/stable/getting_started/WorkingWithInferenceData.html#compute-and-store-posterior-pushforward-quantities. It might still be useful to show how to apply the same transformation to multiple groups for example using map.
Are you currently using ArviZ for your work? Is there anything you found missing too?
@OriolAbril Thanks for the update. Showing how to apply the same transformation to multiple groups for example using map sounds like a good idea, I'll try doing that.
I came across arviz
while working on a university project. No, fortunately nothing was missing :-)