arviz icon indicating copy to clipboard operation
arviz copied to clipboard

Write "Working with InferenceData" page

Open OriolAbril opened this issue 4 years ago • 27 comments

Tell us about it

As part of the documentation revamp described in #1331. We should add a page explaining how to work with InferenceData and show several common tasks. The guide should cover at least the following topics:

#1506

  • [x] Stacking dimensions (maybe even reshaping? reshaping is a bit crazy though), fancy chain/draw extraction methods from #1469
  • [x] Some grouping/aggregating of data: mean
  • [x] Get values and coordinates as arrays
  • [x] Slice inferenceData objects

Pending

  • [ ] modifying, transforming and/or creating a variable within an InferenceData group or groups. See #641 for code examples that can be used for this
  • [ ] how to combine/merge/extend multiple InferenceData objects
  • [ ] Coordinate modification. Maybe something like #1461 (includes example code and explanation)
  • [ ] ...

Thoughts on implementation

Feel free to tackle parts of the issue instead of generating the whole section at once.

It is also probably a good idea to add a See also section at the bottom linking to other docs (i.e. pymc3 or pystan docs), blogs and other resources that use inferencedata. For example: https://docs.pymc.io/notebooks/multilevel_modeling.html

OriolAbril avatar Jan 08 '21 01:01 OriolAbril

can i work on this issue

kenkirito avatar Feb 04 '21 22:02 kenkirito

Great, thanks!

OriolAbril avatar Feb 04 '21 23:02 OriolAbril

is there any discord server or any community chat or meetings to attain. I am done with the setup. just wanted to know more about the project

kenkirito avatar Feb 12 '21 20:02 kenkirito

We have a gitter chat available: https://gitter.im/arviz-devs/community, and there are also questions and discussions about the project at PyMC discourse: https://discourse.pymc.io/ and to a lesser extent at Stan discourse: https://discourse.mc-stan.org/

OriolAbril avatar Feb 12 '21 21:02 OriolAbril

  • [ ] how to combine/merge/extend multiple InferenceData objects

This sounds good, I recently went through InferenceData docs and they do seem ambiguous and all over the place, the entire issue seems like something that would probably be done over multiple PRs, but I guess to begin with I can take care of this part, I'd like to take it up if noone is working on it at the moment.

mjhajharia avatar Apr 03 '21 19:04 mjhajharia

Great, note that you have to extend the currently existing notebook at https://github.com/arviz-devs/arviz/blob/main/doc/source/getting_started/WorkingWithInferenceData.ipynb

There are some examples on how to combine inferencedata objects at https://arviz-devs.github.io/arviz/api/generated/arviz.concat.html#arviz.concat.

Using the documentation system outlined in https://documentation.divio.com/, this working with inferencedata page should be a tutorial page (or a bundle of mini tutorials if you prefer), the explanation is in https://github.com/arviz-devs/arviz/blob/main/doc/source/getting_started/XarrayforArviZ.ipynb, and the reference is https://arviz-devs.github.io/arviz/api/inference_data.html. The reference pages may be the easier ones to find right now, so we should probably add some links from reference pages to the working with inferencedata notebook and viceversa, we can add links from the notebook to the reference page of the functions used.

OriolAbril avatar Apr 04 '21 07:04 OriolAbril

explanation is in https://github.com/arviz-devs/arviz/blob/main/doc/source/getting_started/XarrayforArviZ.ipynb,

this is the explanation for? (sorry for getting back so late in this issue)

mjhajharia avatar Apr 14 '21 10:04 mjhajharia

Following the documentation system in https://diataxis.fr/ (they just changed the url, but it's the same content I linked to above), docs are not supposed to be all over the place but should be separate at least into 4 different pages. InferenceData docs are actually close to getting there, but we need to also make sure each page is clear on what is explained in them and has links to the other pages in case the info is not in that page but in another one (this last part is a bit of a disaster right now).

A very quick summary of diataxis (from its own introduction page):

  Tutorials How-to guides Reference Explanation
oriented to learning a goal information understanding
must allow the newcomer to get started show how to solve a specific problem describe the machinery explain
its form a lesson a series of steps dry description discursive explanation
analogy teaching a small child how to cook a recipe in a cookery book a reference encyclopaedia article an article on culinary social history
for InferenceData Working with InferenceData page (in getting started section) unclear, maybe some page in the user guide secion, maybe links to case studies in pymc/stan/... docs (i.e. radon or rugby examples) InferenceData schema XarrayforArviZ

the InferenceData schema is dry and is not a good resource to understand InferenceData: why it's useful?, why is it needed?, what are it's main (and super cool) features); but it should not be. It's goal is to describe InferenceData in an encycopedic-like manner.

OriolAbril avatar Apr 14 '21 11:04 OriolAbril

@OriolAbril got this, um what about az.concat. that works on two inference datasets and we have one, so should i import another one? or create another one from the original one

mjhajharia avatar Apr 16 '21 11:04 mjhajharia

Depending on what you want to do you can create a fake/synthetic idata or simply use a copy/subset. To show how to extend an inferencedata with more draws, a copy or a subset (i.e. chains 0 and 1 only) is already fine. There is no difference in how to operate with two real and different idata and that.

OriolAbril avatar Apr 16 '21 13:04 OriolAbril

@OriolAbril Hi! I'm sorry for replying so late, I'm alright mentally and physically now so I'm free to get back to contributing!!! Um I think I'll make a draft PR with very basic changes and whatever I feel might work, and from thereon you can suggest redirections or changes, does that sound good?

mjhajharia avatar Apr 28 '21 15:04 mjhajharia

Sounds great! :smile:

OriolAbril avatar Apr 28 '21 16:04 OriolAbril

@OriolAbril I was looking for sample arviz data to use. I realised we could really use a page that describes the qualitative meaning of the sample arviz datasets that we load; else, understanding context WHY we perform inference data operations is tricky. We want to add real-time usage on top of xarrays. let me know if that makes any sense Since We want to add perspective on real-time

mjhajharia avatar May 10 '21 07:05 mjhajharia

I am not sure I understand, but it sounds similar to https://arviz-devs.github.io/arviz/api/generated/arviz.list_datasets.html#arviz.list_datasets, maybe we could generate a page from all the info listed there?

As a general comment, there shouldn't be many context explanations in "Working with InferenceData" page. The goal is to guide users with common idata operations, we use the example data not because of their particular meaning or value, but to skip idata creation and therefore be able to focus only on idata operations.

OriolAbril avatar May 10 '21 07:05 OriolAbril

As a general comment, there shouldn't be many context explanations on the "Working with InferenceData" page. The goal is to guide users with common idata operations. We use the example data, not because of their particular meaning or value, but to skip idata creation and therefore be able to focus only on idata operations.

Got it!! also, thanks for pointing to az.list_datasets() I was looking for this and couldn't find the function; yeah, a page generated from it would make sense. Just adding print(az.list_datasets()) in the source code would be good enough, I guess. And, I'll keep the non-contextual usage function thingy, thanks!

mjhajharia avatar May 10 '21 07:05 mjhajharia

can I work on this=> Coordinate modification. Maybe something like Posterior predictive check (plot_ppc) with dataframe encoding. #1461 (includes example code and explanation)

disha4u avatar Oct 19 '22 03:10 disha4u

coordinate modification will be basically from set coordinates right? the function defined here: https://arviz-devs.github.io/arviz/api/generated/arviz.InferenceData.set_coords.html

disha4u avatar Oct 19 '22 03:10 disha4u

Thanks @disha4u, that would be great.

set_coords is to convert existing variables to coordinates. The example should probably use assign_coords instead like in the example code snippet in the issue linked to add new coordinates from something that isn't already a variable in the dataset. Also note that ArviZ has wrappers for methods of xarray.Dataset which often lack documentation.

I would recommend going over the linked issue and reproducing the example in plot_ppc locally. Then adapting the example and explanation to the notebook. Do not hesitate to ask questions in this issue, on gitter or opening a draft PR to ask questions there

OriolAbril avatar Oct 20 '22 10:10 OriolAbril

ok, will do thanks

disha4u avatar Oct 24 '22 00:10 disha4u

Hello, I'm new here and a beginner. Can I work on this? or is there any other I can work on?

Oluwajuwon-O avatar Sep 27 '23 08:09 Oluwajuwon-O

The pending elements are still work in progress, and nobody is assigned to do it, so feel free to work on that. Or if you have used inferencedata and faced some issues you can also suggest something else to add to the doc.

OriolAbril avatar Sep 29 '23 10:09 OriolAbril

Working with InferenceData

InferenceData is a versatile container used in probabilistic programming libraries like PyMC3 and PyStan to store and manage the results of Bayesian inference. This page serves as a guide to help you work effectively with InferenceData objects, covering common tasks and techniques.

Stacking Dimensions

InferenceData often contains multidimensional data structures, including chains and draws from Bayesian models. You might need to stack dimensions to facilitate analysis or visualization. You can achieve this using the following methods:

Stacking Chains and Draws

To stack chains or draws, you can use the stack method available in some probabilistic programming libraries. This reshapes your data into a more manageable format for analysis. For instance, in PyMC3, you can use:

inference_data = pm.sample(...)
stacked_data = inference_data.posterior.stack(chain_draw=("chain", "draw"))

Reshaping Data

Reshaping is a powerful technique to transform InferenceData objects. While it can be complex, it allows you to structure data to suit your needs. However, it might not be necessary for most common use cases.

Grouping and Aggregating Data

You may want to perform operations like calculating the mean of your posterior samples. InferenceData makes it easy to aggregate data across different dimensions:

mean_across_chains = inference_data.posterior.mean(dim="chain")

Accessing Values and Coordinates

You can extract values and coordinates from InferenceData objects, which is useful for further analysis or visualization:

values = inference_data.posterior["parameter_name"].values
coordinates = inference_data.posterior["parameter_name"].coords

Slicing InferenceData Objects

Slicing allows you to focus on specific subsets of your InferenceData object:

subset = inference_data.sel(chain=0, draw=slice(0, 10))

Modifying and Transforming Variables

InferenceData objects are not set in stone. You can modify, transform, or create new variables within them. Refer to code examples in issue #641 for practical guidance on this.

Combining and Merging InferenceData Objects

If you have multiple InferenceData objects from different sources or runs, you can combine, merge, or extend them to consolidate your results. The specific method may vary depending on the library you are using.

Coordinate Modification

Coordinate modification allows you to manipulate the structure of your InferenceData objects to fit specific analysis needs. For example, you can perform posterior predictive checks and encode the results into dataframes. Refer to issue #1461 for detailed examples and explanations.

See Also

For further information on working with InferenceData objects, you can explore other resources such as:

These resources provide additional insights and examples to help you make the most of InferenceData in your Bayesian inference workflows.

sujitmahapatra avatar Oct 05 '23 09:10 sujitmahapatra

Hi @OriolAbril , to work on this issue I need to extend the WorkingWithInferenceData IPython notebook right? I'm planning to work on adding the modifying, transforming and/or creating a variable within an InferenceData group or groups part.

aadya940 avatar Dec 16 '23 18:12 aadya940

Sorry for the slow reply, just now checking to see the current state, it looks like most of the basic things we aimed to get there are already there. For example, creating a variable from other variables is there: https://python.arviz.org/en/stable/getting_started/WorkingWithInferenceData.html#compute-and-store-posterior-pushforward-quantities. It might still be useful to show how to apply the same transformation to multiple groups for example using map.

Are you currently using ArviZ for your work? Is there anything you found missing too?

OriolAbril avatar Dec 21 '23 19:12 OriolAbril

@OriolAbril Thanks for the update. Showing how to apply the same transformation to multiple groups for example using map sounds like a good idea, I'll try doing that. I came across arviz while working on a university project. No, fortunately nothing was missing :-)

aadya940 avatar Dec 22 '23 11:12 aadya940