Tell us about it

As part of the documentation revamp described in #1331. We should add a page explaining how to work with InferenceData and show several common tasks. The guide should cover at least the following topics:

#1506

[x] Stacking dimensions (maybe even reshaping? reshaping is a bit crazy though), fancy chain/draw extraction methods from #1469
[x] Some grouping/aggregating of data: mean
[x] Get values and coordinates as arrays
[x] Slice inferenceData objects

Pending

[ ] modifying, transforming and/or creating a variable within an InferenceData group or groups. See #641 for code examples that can be used for this
[ ] how to combine/merge/extend multiple InferenceData objects
[ ] Coordinate modification. Maybe something like #1461 (includes example code and explanation)
[ ] ...

Thoughts on implementation

Feel free to tackle parts of the issue instead of generating the whole section at once.

It is also probably a good idea to add a See also section at the bottom linking to other docs (i.e. pymc3 or pystan docs), blogs and other resources that use inferencedata. For example: https://docs.pymc.io/notebooks/multilevel_modeling.html

Jan 08 '21 01:01 OriolAbril

can i work on this issue

Feb 04 '21 22:02 kenkirito

Great, thanks!

Feb 04 '21 23:02 OriolAbril

is there any discord server or any community chat or meetings to attain. I am done with the setup. just wanted to know more about the project

Feb 12 '21 20:02 kenkirito

We have a gitter chat available: https://gitter.im/arviz-devs/community, and there are also questions and discussions about the project at PyMC discourse: https://discourse.pymc.io/ and to a lesser extent at Stan discourse: https://discourse.mc-stan.org/

Feb 12 '21 21:02 OriolAbril

[ ] how to combine/merge/extend multiple InferenceData objects

This sounds good, I recently went through InferenceData docs and they do seem ambiguous and all over the place, the entire issue seems like something that would probably be done over multiple PRs, but I guess to begin with I can take care of this part, I'd like to take it up if noone is working on it at the moment.

Apr 03 '21 19:04 mjhajharia

Great, note that you have to extend the currently existing notebook at https://github.com/arviz-devs/arviz/blob/main/doc/source/getting_started/WorkingWithInferenceData.ipynb

There are some examples on how to combine inferencedata objects at https://arviz-devs.github.io/arviz/api/generated/arviz.concat.html#arviz.concat.

Using the documentation system outlined in https://documentation.divio.com/, this working with inferencedata page should be a tutorial page (or a bundle of mini tutorials if you prefer), the explanation is in https://github.com/arviz-devs/arviz/blob/main/doc/source/getting_started/XarrayforArviZ.ipynb, and the reference is https://arviz-devs.github.io/arviz/api/inference_data.html. The reference pages may be the easier ones to find right now, so we should probably add some links from reference pages to the working with inferencedata notebook and viceversa, we can add links from the notebook to the reference page of the functions used.

Apr 04 '21 07:04 OriolAbril

explanation is in https://github.com/arviz-devs/arviz/blob/main/doc/source/getting_started/XarrayforArviZ.ipynb,

this is the explanation for? (sorry for getting back so late in this issue)

Apr 14 '21 10:04 mjhajharia

Following the documentation system in https://diataxis.fr/ (they just changed the url, but it's the same content I linked to above), docs are not supposed to be all over the place but should be separate at least into 4 different pages. InferenceData docs are actually close to getting there, but we need to also make sure each page is clear on what is explained in them and has links to the other pages in case the info is not in that page but in another one (this last part is a bit of a disaster right now).

A very quick summary of diataxis (from its own introduction page):

	Tutorials	How-to guides	Reference	Explanation
oriented to	learning	a goal	information	understanding
must	allow the newcomer to get started	show how to solve a specific problem	describe the machinery	explain
its form	a lesson	a series of steps	dry description	discursive explanation
analogy	teaching a small child how to cook	a recipe in a cookery book	a reference encyclopaedia article	an article on culinary social history
for `InferenceData`	Working with InferenceData page (in getting started section)	unclear, maybe some page in the user guide secion, maybe links to case studies in pymc/stan/... docs (i.e. radon or rugby examples)	InferenceData schema	XarrayforArviZ

the InferenceData schema is dry and is not a good resource to understand InferenceData: why it's useful?, why is it needed?, what are it's main (and super cool) features); but it should not be. It's goal is to describe InferenceData in an encycopedic-like manner.

Apr 14 '21 11:04 OriolAbril

@OriolAbril got this, um what about az.concat. that works on two inference datasets and we have one, so should i import another one? or create another one from the original one

Apr 16 '21 11:04 mjhajharia

Depending on what you want to do you can create a fake/synthetic idata or simply use a copy/subset. To show how to extend an inferencedata with more draws, a copy or a subset (i.e. chains 0 and 1 only) is already fine. There is no difference in how to operate with two real and different idata and that.

Apr 16 '21 13:04 OriolAbril

@OriolAbril Hi! I'm sorry for replying so late, I'm alright mentally and physically now so I'm free to get back to contributing!!! Um I think I'll make a draft PR with very basic changes and whatever I feel might work, and from thereon you can suggest redirections or changes, does that sound good?

Apr 28 '21 15:04 mjhajharia

Sounds great! :smile:

Apr 28 '21 16:04 OriolAbril

@OriolAbril I was looking for sample arviz data to use. I realised we could really use a page that describes the qualitative meaning of the sample arviz datasets that we load; else, understanding context WHY we perform inference data operations is tricky. We want to add real-time usage on top of xarrays. let me know if that makes any sense Since We want to add perspective on real-time

May 10 '21 07:05 mjhajharia

I am not sure I understand, but it sounds similar to https://arviz-devs.github.io/arviz/api/generated/arviz.list_datasets.html#arviz.list_datasets, maybe we could generate a page from all the info listed there?

As a general comment, there shouldn't be many context explanations in "Working with InferenceData" page. The goal is to guide users with common idata operations, we use the example data not because of their particular meaning or value, but to skip idata creation and therefore be able to focus only on idata operations.

May 10 '21 07:05 OriolAbril

As a general comment, there shouldn't be many context explanations on the "Working with InferenceData" page. The goal is to guide users with common idata operations. We use the example data, not because of their particular meaning or value, but to skip idata creation and therefore be able to focus only on idata operations.

Got it!! also, thanks for pointing to az.list_datasets() I was looking for this and couldn't find the function; yeah, a page generated from it would make sense. Just adding print(az.list_datasets()) in the source code would be good enough, I guess. And, I'll keep the non-contextual usage function thingy, thanks!

May 10 '21 07:05 mjhajharia

can I work on this=> Coordinate modification. Maybe something like Posterior predictive check (plot_ppc) with dataframe encoding. #1461 (includes example code and explanation)

Oct 19 '22 03:10 disha4u

coordinate modification will be basically from set coordinates right? the function defined here: https://arviz-devs.github.io/arviz/api/generated/arviz.InferenceData.set_coords.html

Oct 19 '22 03:10 disha4u

Thanks @disha4u, that would be great.

set_coords is to convert existing variables to coordinates. The example should probably use assign_coords instead like in the example code snippet in the issue linked to add new coordinates from something that isn't already a variable in the dataset. Also note that ArviZ has wrappers for methods of xarray.Dataset which often lack documentation.

I would recommend going over the linked issue and reproducing the example in plot_ppc locally. Then adapting the example and explanation to the notebook. Do not hesitate to ask questions in this issue, on gitter or opening a draft PR to ask questions there

Oct 20 '22 10:10 OriolAbril

ok, will do thanks

Oct 24 '22 00:10 disha4u

Hello, I'm new here and a beginner. Can I work on this? or is there any other I can work on?

Sep 27 '23 08:09 Oluwajuwon-O

The pending elements are still work in progress, and nobody is assigned to do it, so feel free to work on that. Or if you have used inferencedata and faced some issues you can also suggest something else to add to the doc.

Sep 29 '23 10:09 OriolAbril

Working with InferenceData

InferenceData is a versatile container used in probabilistic programming libraries like PyMC3 and PyStan to store and manage the results of Bayesian inference. This page serves as a guide to help you work effectively with InferenceData objects, covering common tasks and techniques.

Stacking Dimensions

InferenceData often contains multidimensional data structures, including chains and draws from Bayesian models. You might need to stack dimensions to facilitate analysis or visualization. You can achieve this using the following methods:

Stacking Chains and Draws

To stack chains or draws, you can use the stack method available in some probabilistic programming libraries. This reshapes your data into a more manageable format for analysis. For instance, in PyMC3, you can use:

inference_data = pm.sample(...)
stacked_data = inference_data.posterior.stack(chain_draw=("chain", "draw"))

Reshaping Data

Reshaping is a powerful technique to transform InferenceData objects. While it can be complex, it allows you to structure data to suit your needs. However, it might not be necessary for most common use cases.

Grouping and Aggregating Data

You may want to perform operations like calculating the mean of your posterior samples. InferenceData makes it easy to aggregate data across different dimensions:

mean_across_chains = inference_data.posterior.mean(dim="chain")

Accessing Values and Coordinates

You can extract values and coordinates from InferenceData objects, which is useful for further analysis or visualization:

values = inference_data.posterior["parameter_name"].values
coordinates = inference_data.posterior["parameter_name"].coords

Slicing InferenceData Objects

Slicing allows you to focus on specific subsets of your InferenceData object:

subset = inference_data.sel(chain=0, draw=slice(0, 10))

Modifying and Transforming Variables

InferenceData objects are not set in stone. You can modify, transform, or create new variables within them. Refer to code examples in issue #641 for practical guidance on this.

Combining and Merging InferenceData Objects

If you have multiple InferenceData objects from different sources or runs, you can combine, merge, or extend them to consolidate your results. The specific method may vary depending on the library you are using.

Coordinate Modification

Coordinate modification allows you to manipulate the structure of your InferenceData objects to fit specific analysis needs. For example, you can perform posterior predictive checks and encode the results into dataframes. Refer to issue #1461 for detailed examples and explanations.

arviz
arviz copied to clipboard

Write "Working with InferenceData" page

Tell us about it

Thoughts on implementation

Working with InferenceData

Stacking Dimensions

Stacking Chains and Draws

Reshaping Data

Grouping and Aggregating Data

Accessing Values and Coordinates

Slicing InferenceData Objects

Modifying and Transforming Variables

Combining and Merging InferenceData Objects

Coordinate Modification

See Also

arviz arviz copied to clipboard

Write "Working with InferenceData" page

Tell us about it

Thoughts on implementation

Working with InferenceData

Stacking Dimensions

Stacking Chains and Draws

Reshaping Data

Grouping and Aggregating Data

Accessing Values and Coordinates

Slicing InferenceData Objects

Modifying and Transforming Variables

Combining and Merging InferenceData Objects

Coordinate Modification

See Also

arviz
arviz copied to clipboard