arviz
arviz copied to clipboard
Extensions to the `to_netcdf` docstring
Tell us about it
Saving an arviz inferenceData
object to a file, I was looking at the docstring (on the website) for guidance. Here are some questions I think would be good to answer in that docstring:
-
The docstring says
Save dataset as a netcdf file. WARNING: Only idempotent in case data is InferenceData I don't understand what "idempotent" means in this context. This function should not change the
InferenceData
argument, so what does it mean for it to be idempotent (or not, for that matter)? If this means that writing the same data object that is not anInferenceData
twice gives different results, the docstring should just say so. For that matter, it seems pretty complicated to understand what will happen if thedata
argument is not anInferenceData
object, but is "any object accepted by convert_to_inference_data". Wouldn't it be simpler to have the user translate the object to save themselves, instead of trying to guess what they want? -
Supporting
coords
anddims
arguments just made me worried. Does this mean that the function won't save thecoords
anddims
that are already in theInferenceData
? Or is this just for the case when the argument is not anInferenceData
(or is, for some reason, anInferenceData
object that doesn't havecoords
anddims
)? Maybe clarify by saying "defaults to thecoords
(respectively,dims
) already in theInferenceData
, if any." -
The description of the
group
argument is hard to understand:groupstr (optional) In case data is not InferenceData, this is the group it will be saved to
Does this mean that this function will try to splice additional information into a previously-existing netcdf file? Or is this just if, for example, we pass a PyMC3 prior predictive trace as the data argument? Again, maybe it would make things simpler for both users and maintainers/developers if things were simplified by requiring an
InferenceData
object as data (and requiring the callers to do the translation themselves).
I will try to solve some of the issues, however, it should be clarified when possible in the docstring. I believe the key is to realize that this function has "two" ways of working.
The first and ideal one is to call it on an InferenceData object. In this case, all optional arguments are not taken into account, and the InferenceData object is stored.
The second case is to call it on a numpy array, pymc trace, pystan object... In this case, az.convert_to_inference_data
is called, providing a minimal support for all from_xyz
functions. This is why the optional arguments are present. Thus, there are 2 steps, conversion to InferenceData and saving the resulting inferenceData.
- In the first case, calling
from_netcdf
sould return the original object, whereas in the second case it would not (it would return an InferenceData object with no guarantee that all the data from the original object is present). This is what I believe the idempotence warning is trying to say. - (and 3) The coords and dims in the inference data will always be saved, these arguments are only present to help in the second case, because the conversion step may need them. As a side note, if someone decided to use obscure conversion methods to generate an InferenceData without coords or dims, using the argument here would not include them neither in the netcdf file nor in the InferenceData object.
I also believe that requiring an inference data object would probably be more clear for both programmers and users.
As I was studying about inferenceData
, I also got confused here. I'd like to rectify this one.
Hello. I'm new year and a beginner. Has this been resolved? Or are there other issues that a beginner can work on?
It still needs to be fixed, it would be great if you can work on this. Here is how the docstring currently looks like: https://python.arviz.org/en/stable/api/generated/arviz.to_netcdf.html, so you can see many of the issues are still there.
I guess you have already seen https://python.arviz.org/en/stable/contributing/index.html, but this is never said too many times; don't hesitate to ask here if you have any doubts about the issue itself, the contributing process...
def to_netcdf(data, filename, group=None, coords=None, dims=None): """ Save an InferenceData object to a NetCDF file.
Parameters
----------
data : InferenceData or any object accepted by convert_to_inference_data
The InferenceData object or any object that can be converted to InferenceData.
filename : str
The name of the NetCDF file to save.
group : str, optional
For non-InferenceData objects, this is the group to which data will be saved within the NetCDF file.
coords : dict, optional
A dictionary of coordinates to be used during conversion to InferenceData. Defaults to the coordinates
already in the InferenceData, if any.
dims : dict, optional
A dictionary of dimensions to be used during conversion to InferenceData. Defaults to the dimensions
already in the InferenceData, if any.
Notes
-----
- When `data` is an InferenceData object, this function is idempotent, meaning that it will not change
the InferenceData object or produce different results when called multiple times with the same input.
- If `data` is not an InferenceData object, this function converts it to InferenceData using
`az.convert_to_inference_data`, which provides support for various data sources. However, be aware
that the idempotence property may not hold in this case, and some information from the original
object may not be preserved in the resulting InferenceData.
- The `coords` and `dims` arguments are primarily for use when converting non-InferenceData objects.
They provide additional information that may be required during the conversion process. When saving
an InferenceData object, the existing coordinates and dimensions will always be saved as they are.
- The `group` argument allows you to specify the group under which data will be saved within the NetCDF
file when `data` is not an InferenceData object. It does not splice additional information into an
existing NetCDF file.
See Also
--------
For more information on using ArviZ and working with InferenceData objects, refer to the ArviZ documentation:
- [ArviZ Documentation](https://arviz-devs.github.io/arviz/)
"""
# Implementation of the function...
Hello, I'm a beginner. Has this been resolved or are there issues that a beginner can work on?