pint-xarray icon indicating copy to clipboard operation
pint-xarray copied to clipboard

API?

Open TomNicholas opened this issue 4 years ago • 17 comments

@jthielen proposed a rough accessor API in pint/#849, to which I've added a couple of things:

DataArray:

  • [x] da.pint.to(...): return dataarray with converted units (#11)
  • [ ] da.pint.to_base_units(): return dataarray with base units
  • [x] da.pint.units: units of quantity (as a Unit)
  • [x] da.pint.magnitude: magnitude of quantity
  • [x] da.pint.quantify(unit_registry=None, unit=None): create DataArray wrapping a Quantity based on string unit attribute of DataArray or specified unit
  • [x] da.pint.dequantify(): replace data with the Quantity's magnitude, and add back string unit attribute from Quantity's unit
  • [x] da.pint.sel(): wrap da.sel to handle indexing with Quantities (by casting to magnitude in the coordinate's units similar to how MetPy does it, since true unit-aware indexing is not available yet in xarray)
  • [x] da.pint.loc: wrap da.loc likewise
  • [ ] da.pint.to_system(system): convert all variables to be expressed in the (base?) units of a different system. Might require upstream additions to pint.

Dataset:

  • [x] ds.pint.to(...): return Dataset with converted units (#11)
  • [ ] ds.pint.to_base_units(): return dataset with base units
  • [x] ds.pint.quantify(unit_registry=None): convert all data variables to quantities
  • [x] ds.pint.dequantify(): convert all data variables from quantities to magnitudes with units as an attribute
  • [x] ds.pint.sel(): wrap ds.sel to handle indexing with Quantities
  • [x] ds.pint.loc: wrap ds.loc likewise
  • [ ] ds.pint.to_system(system): convert all variables to be expressed in the (base?) units of a different system.

(this may be modified as things change on xarray's and pint's end, especially involving Dask arrays (xref #883))

Anything else?

At some point when the integration is more solidified (but before official release) we should change the accessor from pint to units, to get a interface more like what's described here. This would be: a) More intuitive b) Units-library agnostic c) A good fit for potentially using an entrypoint to choose which units library you want to use. There's already an entrypoint for plotting backends in xarray, and plans to add one for storage backends too.

TomNicholas avatar Apr 08 '20 07:04 TomNicholas

One additional thing I've realized is needed since writing https://github.com/hgrecco/pint/issues/849#issuecomment-579992247 is a helper for coordinate unit conversion. Not sure what the best name for this would be though. In MetPy, the tentative name is .convert_coordinate_units, but that really only makes sense since the MetPy accessor's version of .to is .convert_units. Maybe .coordinate_to?

See https://github.com/Unidata/MetPy/pull/1325 / https://github.com/Unidata/MetPy/compare/v0.12.0...jthielen:0-12-patch-xarray-0-15-1?expand=1

jthielen avatar Apr 08 '20 14:04 jthielen

we could also provide work-arounds for operations like rolling or ffill that, due the functions and libraries they use (i.e. bottleneck, numbagg, scipy, numpy.lib, etc., but also numpy.vectorize), won't be able to support duck arrays in the near future (though we should probably wait until there is a final decision on that, right now these methods were simply postponed).

keewis avatar Apr 08 '20 19:04 keewis

a helper for coordinate unit conversion.

So a da.pint.coords_to, that gets called within da.pint.to?

Also what about UnitRegistry().wraps? Do we need a xarray equivalent for that? Would that help with providing workarounds for rolling etc?

TomNicholas avatar Apr 09 '20 04:04 TomNicholas

a helper for coordinate unit conversion.

So a da.pint.coords_to, that gets called within da.pint.to?

Not quite, it would be separate. I'm thinking of something to change the units on a coordinate of da without changing anything else on da, which needs to follow different logic given that Indexes are immutable, and, for now, not supporting units directly...so something like

def coord_to(self, coord, units):
    new_coord_var = self.da[coord].copy(
        data=Quantity(self.da[coord].values, self.da[coord].attrs.get('units')).m_as(units)
    )
    new_coord_var.attrs['units'] = str(units)
    return self.da.assign_coords(coords={coord: new_coord_var})

jthielen avatar Apr 09 '20 05:04 jthielen

how about also accepting a dict / kwargs dict (like assign / assign_coords) to DataArray.pint.to and then using different branches for data and coordinates? That way, converting the data and multiple coordinates to (different) units would be possible with one function call.

Obviously, we need something really similar for Dataset.pint.to, so we could implement it by using _to_temp_dataset and to_dataarray.

keewis avatar Apr 09 '20 12:04 keewis

how about also accepting a dict / kwargs dict (like assign / assign_coords) to DataArray.pint.to and then using different branches for data and coordinates? That way, converting the data and multiple coordinates to (different) units would be possible with one function call.

So long as we don't lose the easy ability to just convert data units (e.g., da.pint.to('degC')), that sounds great.

jthielen avatar Apr 09 '20 14:04 jthielen

So as a to-do list, that would mean one positional arg for the data and multiple keyword args for the coords in da.pint.to(), then all keyword args for ds.pint.to():

  • [x] da.pint.to(unit) - convert data
  • [x] da.pint.to(new_unit1, coord1=new_unit2, coord2=new_unit3, ...) - convert data and multiple coords simultaneously
  • [x] ds.pint.to(var1=new_unit1, var2=new_unit2, coord1=new_unit3, ...) - convert multiple data vars and coords simultaneously

TomNicholas avatar Apr 09 '20 15:04 TomNicholas

Agreed for points 1 (DataArray only?) and 3 (Dataset only). For 2 there might be a few more options (nothing wrong with this one), but I think we should discuss them in a separate issue / when we implement it to keep this issue as a high level API overview.

keewis avatar Apr 09 '20 15:04 keewis

something else that is not a part of the API but in scope for this package (I think?): maybe we could monkeypatch the repr of DataArray, Variable and Dataset (or register some sort of repr function for pint arrays (via entrypoints? on import?)) to make it more readable (i.e. make sure the unit is always visible). See also pydata/xarray#2773

keewis avatar Apr 15 '20 21:04 keewis

I definitely think that's in scope for this package, at least until some more general solution for reprs of all wrapped arrays is implemented upstream.

Monkey-patching actually seems like a reasonable temporary solution here I think? The worst that can happen is that some other method which uses the repr fails to display the units right

TomNicholas avatar Apr 16 '20 13:04 TomNicholas

I definitely agree that some kind of repr fix is in order. The current default is...less than ideal...

jthielen avatar Apr 21 '20 17:04 jthielen

from #11:

  • [x] pint_xarray.testing: for testing purposes
  • [ ] conversion / extraction functions (from pint_xarray.conversions)

keewis avatar Jul 08 '20 16:07 keewis

if we can get the reprs to work (pydata/xarray#2773, hgrecco/pint#1133), improve the documentation and maybe also expose the pint_xarray.testing module, we should be close enough to release a initial development version (0.1?)

keewis avatar Jul 17 '20 16:07 keewis

the HTML repr works (pint got released today), and there are #20 (docs), #22 (inline repr) and #24 (testing), so once those are merged and the placeholder functions we have right now are temporarily removed, we should be able to release 0.1

keewis avatar Aug 22 '20 21:08 keewis

the HTML repr works (pint got released today), and there are #20 (docs), #22 (inline repr) and #24 (testing), so once those are merged and the placeholder functions we have right now are temporarily removed, we should be able to release 0.1

Sounds great! Right now though it looks like the only placeholders are sel and loc. If you're good with waiting a few days, I should be able to get around to adding those in time for 0.1...I should be able to implement them the same way I did in MetPy.

jthielen avatar Aug 23 '20 17:08 jthielen

I was thinking of plus_minus, to_base_units and to_system, which don't have docstrings, work only on the data or raise NotImplementedError.

Edit: let's continue this in #25

keewis avatar Aug 23 '20 19:08 keewis

from #61: try to dequantify automatically before plotting (see xarray.plot.plot.label_from_attrs)

keewis avatar Feb 20 '21 13:02 keewis