pint icon indicating copy to clipboard operation
pint copied to clipboard

RFC: wrap any array API standard compatible array

Open lucascolley opened this issue 1 year ago • 27 comments

Preface: I'm not familiar with the history behind https://pint.readthedocs.io/en/stable/user/numpy.html#Array-Type-Support, but I have experience adding support for alternative array types to SciPy. For context, see https://numpy.org/neps/nep-0056-array-api-main-namespace.html#abstract, which superseded NEP 30.


The array API standard is being utilised in libraries like SciPy and scikit-learn to add support for array libraries like CuPy, PyTorch, and JAX. It seems feasible that Pint could wrap any library which is compatible with the standard.

The model for how this can be done is demonstrated in https://github.com/mdhaber/marray. Marray adds masked array support to standard-compatible arrays. Similar to Pint's Quantity, an MArray is some data together with a mask:

MArray(
    Array([1, 2], dtype=array_api_strict.int64),
    Array([False,  True], dtype=array_api_strict.bool)
)

Marray exposes one function, get_namespace, which takes as input a standard-compatible namespace, and returns a new standard-compatible namespace with the additional capabilities of creating and correctly processing masked arrays. Under the hood, this namespace wraps each function to handle anything specific to masks, and otherwise dispatches to the underlying array namespace. So for pint, pint.get_namespace(torch) would return a namespace for pint-wrapped PyTorch tensors.

MArrays also expose this namespace via the __array_namespace__ method, which allows libraries to operate on them like any other standard array:

# get the namespace for input pint(torch) array x
xp = x.__array_namespace__()
y = xp.exp(x) # returns another pint(torch) array
z = xp.asarray([1, 2, 3]) # also a pint(torch) array
...

Marray currently passes almost the entirety of https://github.com/data-apis/array-api-tests. It would be cool to see if Pint could do the same!

This would bring a few benefits:

  • for users, they can use pint with whatever standard-compatible library they like
  • for libraries which depend on pint, they can also then work on supporting any standard-compatible arrays
  • users can pass Pint Quantitys to any library which supports standard-compatible arrays, like SciPy (eventually!)

@nabobalis I know you had expressed interest in this.

lucascolley avatar Dec 29 '24 11:12 lucascolley

yes, there is interest. there have been a few issues relating to this in the past but it requires someone to start it! I suggest we create a seperate repo for this as it allows the array code to be iterated and released faster and independently to pint. maybe pint-array or pint-array-api?

andrewgsavage avatar Dec 29 '24 18:12 andrewgsavage

alright! Extremely rough POC at https://github.com/lucascolley/pint-array:

In [1]: from pint import UnitRegistry

In [2]: from pint_array import get_namespace

In [3]: import array_api_strict as xp

In [4]: pxp = get_namespace(xp)

In [5]: ureg = UnitRegistry()

In [6]: x = pxp.asarray(xp.arange(6), units=ureg.meter)

In [7]: pxp.sum(x)
Out[7]: <Quantity(15, 'meter')>

In [9]: x = pxp.asarray(x, dtype=xp.float64)

In [10]: pxp.var(x)
Out[10]: <Quantity(2.9166666666666665, 'meter ** 2')>

In [11]: x = pxp.asarray(x, units=ureg.degC)

In [12]: pxp.sum(x)
---------------------------------------------------------------------------
OffsetUnitCalculusError                   Traceback (most recent call last)
Cell In[12], line 1
----> 1 pxp.sum(x)

File ~/programming/pint-array/pint_array/__init__.py:42, in get_namespace.<locals>.sum(x, axis, dtype, keepdims)
     40 units = x.units
     41 magnitude = xp.sum(x, axis=axis, dtype=dtype, keepdims=keepdims)
---> 42 units = (1 * units + 1 * units).units
     43 return UnitRegistry.Quantity(magnitude, units)

File ~/programming/pint-array/.pixi/envs/default/lib/python3.13/site-packages/pint/facets/plain/quantity.py:849, in PlainQuantity.__add__(self, other)
    846 if isinstance(other, datetime.datetime):
    847     return self.to_timedelta() + other
--> 849 return self._add_sub(other, operator.add)

File ~/programming/pint-array/.pixi/envs/default/lib/python3.13/site-packages/pint/facets/plain/quantity.py:101, in check_implemented.<locals>.wrapped(self, *args, **kwargs)
     99 elif isinstance(other, list) and other and isinstance(other[0], type(self)):
    100     return NotImplemented
--> 101 return f(self, *args, **kwargs)

File ~/programming/pint-array/.pixi/envs/default/lib/python3.13/site-packages/pint/facets/plain/quantity.py:825, in PlainQuantity._add_sub(self, other, op)
    823     units = other._units
    824 else:
--> 825     raise OffsetUnitCalculusError(self._units, other._units)
    827 return self.__class__(magnitude, units)

OffsetUnitCalculusError: Ambiguous operation with offset unit (degree_Celsius, degree_Celsius). See https://pint.readthedocs.io/en/stable/user/nonmult.html for guidance.

With a bit of work it should be relatively straightforward based on the work already done to wrap NumPy (I hope).

lucascolley avatar Dec 30 '24 01:12 lucascolley

Thanks, that's very helpful to understand how it works.

Assuming the function signatures are the same for numpy asn the array API, it may be possible to change the implements decorator to also create functions for pint_namespace That would be fine to do in pint.

andrewgsavage avatar Dec 30 '24 12:12 andrewgsavage

Assuming the function signatures are the same for numpy asn the array API, it may be possible to change the implements decorator to also create functions for pint_namespace That would be fine to do in pint.

Yes, it seems like we should be able to reuse a lot of that code. Given that NumpyWrapper supports functions (and ufuncs) which aren't in the array API standard, I think it will be easier to keep them separate for now.

I just pushed a change to the repo such that we can define an ArrayUnitQuantity, which is a pint.Quantity with additional methods from ArrayQuantity. Since the return value of the __array_namespace__ method is the pint_namespace which is itself defined using ArrayUnitQuantitys, I'm not sure yet how ArrayQuantity could become a standalone facet to integrate into pint.UnitRegistry. But maybe that isn't needed!

lucascolley avatar Dec 30 '24 12:12 lucascolley

xref https://github.com/astropy/astropy/issues/17588

This is probably as good a moment as any to mention that @nstarman and I have been working on a new Quantity implementation that aims to be array API compatible and agnostic of the unit system - see https://github.com/astropy/quantity-2.0 and the design considerations at https://github.com/astropy/astropy-APEs/pull/91 (especially the PDF linked on top).

As our hope in part is to replace astropy's Quantity, our minimal goal is to provide full coverage of the array API; it would be the full numpy API for ndarray (i.e., including out arguments, specializations of ufuncs like np.add.{at, reduce, reduceat, accumulate}, and gufuncs.

Anyway, the part that is perhaps most relevant here is the unit-agnostic part - this would need some standardization of how units interact with arrays, perhaps new dunder methods of some agreement on operators (e.g., astropy can attach a unit via value << unit - maybe that could become a standard?)

mhvk avatar Dec 30 '24 14:12 mhvk

A shared units & Quantity API would be fantastic! I would vote for defining a quantity-api libraries with some runtime-checkable Protocols. I mocked up something along these lines in https://github.com/nstarman/units/tree/main/src/units/api

nstarman avatar Dec 30 '24 19:12 nstarman

@mhvk , when you say unit system, do you mean the unit module, pint or astropy.units etc?

the linked pdf refers to a Quantity API. If I am understanding it correctly, if pint, astropy.units and any other unit library followed it, then there could be a quantity-array library that works for both pint and astropy.units.

could then make pint-pandas and pint-xarray work for astropy and others too

andrewgsavage avatar Dec 30 '24 19:12 andrewgsavage

Yep, that sounds perfect. The namespaces I'm building in pint-array just need an array library, a quantity class from which to inherit, and some units compatible with that quantity. Currently, that's pint.Quantity and pint.UnitRegistry, but it should be relatively simple to swap those out for a new standard if you folks can figure one out!

That's probably the hard part; in the meantime, I'll work on getting these array-quantity namespaces up to standard-compliance, while attempting to do the right thing with units. I could probably use some help writing tests for correct unit-handling at some point!

lucascolley avatar Dec 30 '24 20:12 lucascolley

@lucascolley - spurred on by this conversation, I've pushed my implementation of ufucs/element-wise operations to https://github.com/astropy/quantity-2.0/pull/22 - tests/test_element_wise_functions.py may be fairly directly useful to you.

mhvk avatar Dec 30 '24 22:12 mhvk

Yes, there aren't all that many classes and functions needed for the array API, based on pint's numpy array_function implementation. I think a Quantity API defining the classes and functions needed to write the pint-array library would be a good first version. Is https://github.com/astropy/quantity-2.0 the best place for such a discussion on a Quantity API?

There are a load of numpy tests that have been added as people implement numpy functions.

andrewgsavage avatar Dec 30 '24 23:12 andrewgsavage

What do we think about creation-like functions such as ones_like? Should the result be dimensionless or have the same units as the input array? Pint currently returns a dimensionless array for NumPy arrays.

lucascolley avatar Dec 30 '24 23:12 lucascolley

that was being changed to have the same units as the input array, but has got stuck https://github.com/hgrecco/pint/issues/882

andrewgsavage avatar Dec 31 '24 00:12 andrewgsavage

Up until this point everything has been working with pint.UnitRegistry. I'm now trying to implement https://data-apis.org/array-api/latest/API_specification/generated/array_api.concat.html#array_api.concat, my implementation of which makes use of https://pint.readthedocs.io/en/stable/api/facets.html#pint.facets.plain.PlainQuantity.m_as. That takes us through to this generic _convert function: https://github.com/hgrecco/pint/blob/74b708661577623c0c288933d8ed6271f32a4b8b/pint/facets/plain/registry.py#L1087-L1097

In the array API standard:

Behavior is not specified when mixing a Python float and an array with an integer data type; this may give float32, float64, or raise an exception.

and so the multiplication here can throw an exception for integer arrays.

Do you think we could try to address this problem in the plain registry @andrewgsavage ? It will be a bit hacky without adding proper array API awareness to Pint, but I think something like

if hasattr(value, __array_namespace__) and value.__array_namespace__().isdtype(value.dtype, "integral"):
    factor = int(factor)

might fix the integer array case.

I suppose the alternative is for me to subclass UnitRegistry and override the _convert method. But it might be nice to avoid that if possible?

lucascolley avatar Dec 31 '24 22:12 lucascolley

The conversion factor should not be converted to an int - that would lead to things like 4 inches converting to 8cm instead of 10cm. Can this be handled within the concat implementation by special casing the integer arrays to check consistent units?

andrewgsavage avatar Dec 31 '24 23:12 andrewgsavage

ah okay, so if consolidating the units would require a non-integral conversion factor and we have integer arrays, we can raise. Could you point me to a helper in Pint which checks for such a factor, if it exists already? Or otherwise, how would be best to go about writing it?

lucascolley avatar Dec 31 '24 23:12 lucascolley

I don't think we have that. I'd either raise or return a float array. Do you want to use your repo for these discussions to keep this cleaner?

andrewgsavage avatar Dec 31 '24 23:12 andrewgsavage

I was just made aware of @simonheybrock's https://pydims.github.io/pydims/index.html, which adds named dimensions and units to array API standard compatible arrays. It supports units from both astropy.units and pint, so it sounds like you should be involved in any discussion of standardising a unit/quantity API @simonheybrock.

Since pydims adds named dimensions, its arrays can't comply with the standard for the same reasons as xarray. So the scopes of pydims and (what is currently called) pint-array are orthogonal in that regard. pint-array should definitely look towards supporting both forms of units like pydims does, but perhaps that will be easier or superseded if we wait for the work that @nstarman and @mhvk have in mind.

lucascolley avatar Jan 01 '25 12:01 lucascolley

alright, https://github.com/lucascolley/pint-array is now passing array-api-tests 🎉 ! The behaviour with units is currently untested and known to be quite buggy, but all of the building blocks are there.

I probably won't have time to work on proper tests, documentation etc. until this summer following my university exams, but contributions are very welcome!

lucascolley avatar Jan 01 '25 21:01 lucascolley

Example of unit-aware integration now working with SciPy 1.15.0 (released yesterday)![^1]

~EDIT: I'm not sure the unit behaviour is correct yet here - I think we would want pint^2 for the first result and pint^3 for the second? EDIT 2: I think this is because I haven't correctly defined the units for __gt__ yet~

EDIT 3: fixed now!

In [1]: import pint_array; import numpy as np; pnp = pint_array.pint_namespace(np); from pint import UnitRegistry; ureg = UnitRegistry()

In [2]: from scipy.integrate import cubature

In [3]: def f(x, n):
   ...:     """f(x) = n * x"""
   ...:     return n*x
   ...: 

In [4]: a = pnp.asarray([0], units=ureg.pint)

In [5]: b = pnp.asarray([1], units=ureg.pint)

In [6]: n = pnp.arange(10)

In [7]: res = cubature(
   ...:     f,
   ...:     a=a,
   ...:     b=b,
   ...:     args=(n,),
   ...: )

In [8]: res.estimate
Out[8]: 
<Quantity(
  array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5]),
  'pint ** 2'
)>

In [9]: n.magnitude * (1**2) / 2 # closed form
Out[9]: array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

In [10]: res.error
Out[10]: 
<Quantity(
  array([0.00000000e+00, 9.02056208e-17, 1.80411242e-16, 3.46944695e-16,
         3.60822483e-16, 1.11022302e-16, 6.93889390e-16, 2.49800181e-16,
         7.21644966e-16, 1.66533454e-15]),
  'pint ** 2'
)>

In [11]: def f(x, n):
    ...:     """f(x) = n * (x * x)"""
    ...:     return n * (x * x)
    ...: 

In [12]: res = cubature(
    ...:     f,
    ...:     a=a,
    ...:     b=b,
    ...:     args=(n,),
    ...: )

In [13]: res.estimate
Out[13]: 
<Quantity(
  array([0.        , 0.33333333, 0.66666667, 1.        , 1.33333333,
         1.66666667, 2.        , 2.33333333, 2.66666667, 3.        ]),
  'pint ** 3'
)>

In [14]: n.magnitude * (1**3) / 3 # closed form
Out[14]: 
array([0.        , 0.33333333, 0.66666667, 1.        , 1.33333333,
       1.66666667, 2.        , 2.33333333, 2.66666667, 3.        ])

In [15]: res.error
Out[15]: 
<Quantity(
  array([0.00000000e+00, 2.77555756e-17, 5.55111512e-17, 3.33066907e-16,
         1.11022302e-16, 4.44089210e-16, 6.66133815e-16, 2.49800181e-16,
         2.22044605e-16, 3.33066907e-16]),
  'pint ** 3'
)>

Still need to fix the unit behaviour of quite a few functions/methods, but this is promising :)

[^1]: you'll need to set the env var SCIPY_ARRAY_API=1

lucascolley avatar Jan 04 '25 13:01 lucascolley

I was just made aware of @SimonHeybrock's https://pydims.github.io/pydims/index.html, which adds named dimensions and units to array API standard compatible arrays. It supports units from both astropy.units and pint, so it sounds like you should be involved in any discussion of standardising a unit/quantity API @SimonHeybrock.

Since pydims adds named dimensions, its arrays can't comply with the standard for the same reasons as xarray. So the scopes of pydims and (what is currently called) pint-array are orthogonal in that regard. pint-array should definitely look towards supporting both forms of units like pydims does, but perhaps that will be easier or superseded if we wait for the work that @nstarman and @mhvk have in mind.

Thanks for the shout-out @lucascolley! I would like to note that I am not actively working on PyDims. It was (for now) meant to gauge the communities interest in the approach, including in particular a "Python Units API Standard".

As an aside, there is now yet another Python units library, since @phlptp added Python bindings to https://github.com/LLNL/units.

SimonHeybrock avatar Jan 06 '25 04:01 SimonHeybrock

@SimonHeybrock I have a made a topic proposing collabration on a Quantity API https://github.com/astropy/quantity-2.0/issues/23

andrewgsavage avatar Jan 06 '25 23:01 andrewgsavage

Hi, do you have plan to support torch.Tensor as well?

Roy-Kid avatar Mar 10 '25 12:03 Roy-Kid

you may be able to use https://data-apis.org/array-api-compat/ along with https://github.com/quantity-dev/quantity-array

andrewgsavage avatar Mar 10 '25 13:03 andrewgsavage

yes, supporting PyTorch is part of the goal here. The current caveats on support are listed at https://data-apis.org/array-api-compat/supported-array-libraries.html#pytorch

lucascolley avatar Mar 10 '25 13:03 lucascolley

@andrewgsavage @lucascolley Brillant wrok! Thanks for your information, and I can move my code to pint safely.

Roy-Kid avatar Mar 10 '25 14:03 Roy-Kid

@andrewgsavage @lucascolley Brillant wrok! Thanks for your information, and I can move my code to pint safely.

if you do get this working, please post about it here. it would be great to see it in action!

andrewgsavage avatar Mar 20 '25 21:03 andrewgsavage

@andrewgsavage @lucascolley Brillant wrok! Thanks for your information, and I can move my code to pint safely.

if you do get this working, please post about it here. it would be great to see it in action!

I will. But now it seems cannot carry pint unit with torch.tensor(e.g. torch.sqrt( Tensor * ureg.unit )), I have to calculate the conversion factor before passing them to module....

Once I have time I will figure out how to use it seamlessly

Roy-Kid avatar Mar 24 '25 09:03 Roy-Kid

let's close this out as work is now ongoing over at @quantity-dev

lucascolley avatar Sep 22 '25 11:09 lucascolley