arviz icon indicating copy to clipboard operation
arviz copied to clipboard

Extensions to the `to_netcdf` docstring

Open rpgoldman opened this issue 5 years ago • 6 comments

Tell us about it

Saving an arviz inferenceData object to a file, I was looking at the docstring (on the website) for guidance. Here are some questions I think would be good to answer in that docstring:

  1. The docstring says

    Save dataset as a netcdf file. WARNING: Only idempotent in case data is InferenceData I don't understand what "idempotent" means in this context. This function should not change the InferenceData argument, so what does it mean for it to be idempotent (or not, for that matter)? If this means that writing the same data object that is not an InferenceData twice gives different results, the docstring should just say so. For that matter, it seems pretty complicated to understand what will happen if the data argument is not an InferenceData object, but is "any object accepted by convert_to_inference_data". Wouldn't it be simpler to have the user translate the object to save themselves, instead of trying to guess what they want?

  2. Supporting coords and dims arguments just made me worried. Does this mean that the function won't save the coords and dims that are already in the InferenceData? Or is this just for the case when the argument is not an InferenceData (or is, for some reason, an InferenceData object that doesn't have coords and dims)? Maybe clarify by saying "defaults to the coords (respectively, dims) already in the InferenceData, if any."

  3. The description of the group argument is hard to understand:

    groupstr (optional) In case data is not InferenceData, this is the group it will be saved to

    Does this mean that this function will try to splice additional information into a previously-existing netcdf file? Or is this just if, for example, we pass a PyMC3 prior predictive trace as the data argument? Again, maybe it would make things simpler for both users and maintainers/developers if things were simplified by requiring an InferenceData object as data (and requiring the callers to do the translation themselves).

rpgoldman avatar May 29 '19 01:05 rpgoldman

I will try to solve some of the issues, however, it should be clarified when possible in the docstring. I believe the key is to realize that this function has "two" ways of working.

The first and ideal one is to call it on an InferenceData object. In this case, all optional arguments are not taken into account, and the InferenceData object is stored.

The second case is to call it on a numpy array, pymc trace, pystan object... In this case, az.convert_to_inference_data is called, providing a minimal support for all from_xyz functions. This is why the optional arguments are present. Thus, there are 2 steps, conversion to InferenceData and saving the resulting inferenceData.

  1. In the first case, calling from_netcdf sould return the original object, whereas in the second case it would not (it would return an InferenceData object with no guarantee that all the data from the original object is present). This is what I believe the idempotence warning is trying to say.
  2. (and 3) The coords and dims in the inference data will always be saved, these arguments are only present to help in the second case, because the conversion step may need them. As a side note, if someone decided to use obscure conversion methods to generate an InferenceData without coords or dims, using the argument here would not include them neither in the netcdf file nor in the InferenceData object.

I also believe that requiring an inference data object would probably be more clear for both programmers and users.

OriolAbril avatar May 29 '19 03:05 OriolAbril

As I was studying about inferenceData, I also got confused here. I'd like to rectify this one.

percygautam avatar Jan 18 '20 17:01 percygautam

Hello. I'm new year and a beginner. Has this been resolved? Or are there other issues that a beginner can work on?

Oluwajuwon-O avatar Sep 27 '23 09:09 Oluwajuwon-O

It still needs to be fixed, it would be great if you can work on this. Here is how the docstring currently looks like: https://python.arviz.org/en/stable/api/generated/arviz.to_netcdf.html, so you can see many of the issues are still there.

I guess you have already seen https://python.arviz.org/en/stable/contributing/index.html, but this is never said too many times; don't hesitate to ask here if you have any doubts about the issue itself, the contributing process...

OriolAbril avatar Sep 29 '23 10:09 OriolAbril

def to_netcdf(data, filename, group=None, coords=None, dims=None): """ Save an InferenceData object to a NetCDF file.

Parameters
----------
data : InferenceData or any object accepted by convert_to_inference_data
    The InferenceData object or any object that can be converted to InferenceData.
filename : str
    The name of the NetCDF file to save.
group : str, optional
    For non-InferenceData objects, this is the group to which data will be saved within the NetCDF file.
coords : dict, optional
    A dictionary of coordinates to be used during conversion to InferenceData. Defaults to the coordinates
    already in the InferenceData, if any.
dims : dict, optional
    A dictionary of dimensions to be used during conversion to InferenceData. Defaults to the dimensions
    already in the InferenceData, if any.

Notes
-----
- When `data` is an InferenceData object, this function is idempotent, meaning that it will not change
  the InferenceData object or produce different results when called multiple times with the same input.

- If `data` is not an InferenceData object, this function converts it to InferenceData using
  `az.convert_to_inference_data`, which provides support for various data sources. However, be aware
  that the idempotence property may not hold in this case, and some information from the original
  object may not be preserved in the resulting InferenceData.

- The `coords` and `dims` arguments are primarily for use when converting non-InferenceData objects.
  They provide additional information that may be required during the conversion process. When saving
  an InferenceData object, the existing coordinates and dimensions will always be saved as they are.

- The `group` argument allows you to specify the group under which data will be saved within the NetCDF
  file when `data` is not an InferenceData object. It does not splice additional information into an
  existing NetCDF file.

See Also
--------
For more information on using ArviZ and working with InferenceData objects, refer to the ArviZ documentation:
- [ArviZ Documentation](https://arviz-devs.github.io/arviz/)

"""
# Implementation of the function...

sujitmahapatra avatar Oct 05 '23 09:10 sujitmahapatra

Hello, I'm a beginner. Has this been resolved or are there issues that a beginner can work on?

lokiville avatar May 03 '24 21:05 lokiville