xarray icon indicating copy to clipboard operation
xarray copied to clipboard

support for units with pint

Open keewis opened this issue 6 years ago • 7 comments

pint's implementation of NEP-18 (see hgrecco/pint#905) is close enough so we can finally start working on the pint support (i.e. make the integration tests pass). This would be the list of tasks to get there:

  • integration tests:
    • [x] implement integration tests for DataArray, Dataset and top-level functions (#3238, #3447, #3493)
    • [x] add tests for Variable as discussed in #3493 (#3654)
    • [x] clean up the current tests (#3600)
    • [x] use the standard assert_identical and assert_allclose functions (#3611, #3643, #3654, #3706, #3975)
    • [x] clean up the TestVariable.test_pad tests
  • actually get xarray to support units:
    • [x] top-level functions (#3611)
    • [x] Variable (#3706)
      • rolling_window and identical need larger modifications
    • [x] DataArray (#3643)
    • [x] Dataset
    • [x] silence all the UnitStrippedWarnings in the testsuite (#4163)
    • [ ] try to get nanprod to work with quantities
    • [x] add support for per variable fill values (#4165)
    • [x] repr with units (#2773)
    • [ ] type hierarchy (e.g. for np.maximum(data_array, quantity) vs np.maximum(quantity, data_array)) (#3950)
  • update the documentation
    • [x] point to pint-xarray (see #4530)
    • [x] mention the requirement for UnitRegistry(force_ndarray=True) or UnitRegistry(force_ndarray_like=True) (see https://pint-xarray.readthedocs.io/en/stable/creation.html#attaching-units)
    • [x] list the known issues (see https://github.com/pydata/xarray/pull/3643#issue-354872657 and https://github.com/pydata/xarray/pull/3643#issuecomment-602225731) (#4530):
      • pandas (indexing)
      • bottleneck (bfill, ffill)
      • scipy (interp)
      • numbagg (rolling_exp)
      • numpy.lib.stride_tricks.as_strided: rolling
      • numpy.vectorize: interpolate_na
    • [x] ~update the install instructions (we can use standard conda / pip now)~ this should be done by pint-xarray

keewis avatar Dec 04 '19 13:12 keewis

Thanks for leading this effort @keewis.

I would start with the lowest-level operations like the constructors and align, concat, merge. These are called in many of the other functions so fixing these is a prerequisite for getting the rest working. I've looked at align, concat & merge recently so can help if you need to chat about confusing error messages.

indexes strip units

What does this mean? You can't have units in a IndexVariable?

dcherian avatar Dec 04 '19 15:12 dcherian

What does this mean? You can't have units in a IndexVariable?

yes, we had that discussion from https://github.com/pydata/xarray/issues/525#issuecomment-514452182 on. Short version: pd.Index converts using np.asarray and support for units probably requires #1603.

keewis avatar Dec 04 '19 15:12 keewis

I will try to figure out the reason for each of these test failures, but I'd appreciate help.

Would #3643 be the best place to offer contributions at this point, or somewhere else?

amcnicho avatar Mar 11 '20 20:03 amcnicho

I think issues related to DataArray + pint should be in #3643, for everything else you can use this issue or new issues / pull requests.

If you want to, I'd appreciate someone reviewing the tests in test_units.py since I don't think anyone other than me thoroughly looked at all of them. You could also investigate / fix the Dataset issues, investigate the reason for the UnitStrippedWarnings or start writing documentation on how to use pint in combination with xarray.

keewis avatar Mar 11 '20 20:03 keewis

so, except from the major issues mentioned above which we won't be able to fix in the near future (but there will probably be workarounds in pint-xarray) we only have three minor issues: nanprod, support for per variable fill values and the repr (#2773).

I don't think nanprod and the fill values are particularly urgent, so if we get support for the repr (maybe using some sort of hook that a library like pint-xarray can then use to properly format the duckarray) and put together the documentation page, we could include this in the 0.16 release.

Edit: I guess the release is already big enough so I don't really mind waiting on the next release, but this is really close.

keewis avatar Jul 02 '20 21:07 keewis

@keewis Shall we close this? It seems the only outstanding one is nanprod with quantities. Which sort of indicates that we've made the necessary big changes.

dcherian avatar Jul 20 '22 19:07 dcherian

#6873 might fix the nanprod issue, and we have a separate issue for the last big change left (#3950, which is not really limited to quantities) so I agree that we should be able to close this with #6873.

We might want to open a new issue to get the known issues to work, though.

keewis avatar Aug 03 '22 11:08 keewis