array-api icon indicating copy to clipboard operation
array-api copied to clipboard

RFC: add `diff` for computing the n-th forward difference

Open kgryte opened this issue 1 year ago • 1 comments

This RFC proposes the addition of a new API in the array API specification for computing the n-th forward difference along a specified dimension.

Overview

Based on array comparison data, the API is available across all major array libraries in the PyData ecosystem.

diff was originally discussed in https://github.com/data-apis/array-api/issues/187 as a potential standardization candidate and has been requested by downstream libraries, such as xarray.

Prior art

  • NumPy: https://numpy.org/doc/stable/reference/generated/numpy.diff.html
    • append and prepend defaults to <no_value>.
  • CuPy: https://docs.cupy.dev/en/stable/reference/generated/cupy.diff.html
  • Dask: https://docs.dask.org/en/latest/generated/dask.array.diff.html
  • PyTorch: https://pytorch.org/docs/stable/generated/torch.diff.html
    • prepend and append must be tensors.
  • JAX: https://jax.readthedocs.io/en/latest/_autosummary/jax.numpy.diff.html
  • TensorFlow: https://www.tensorflow.org/api_docs/python/tf/experimental/numpy/diff
    • Does not support prepend and append kwargs.

Proposal

def diff(x: array, /, *, n: int = 1, axis: int = -1, prepend = Optional[array], append = Optional[array]) -> array

Questions

  • NumPy supports prepend and append as scalar values and subsequently wraps using asarray. Are we okay limiting to only arrays within the specification? Libraries, such as NumPy, would be free to accept scalars; this would just not be considered portable behavior.
  • Apart from scalars, NumPy requires that prepend and append arrays match the shape of x except along the specified axis, thus precluding broadcasting. Is there ever a situation in which broadcasting would make sense?
  • The output array must have the same dtype as the input array. Consequently, when x has a boolean dtype, the output array must also have a boolean dtype. Similarly, unsigned integer input arrays result in unsigned integer output arrays. Are we okay requiring that diff support boolean and unsigned integer dtypes? Or should we limit portable behavior to floating-point (real and complex) and signed integers?

kgryte avatar Apr 04 '24 09:04 kgryte

Are we okay limiting to only arrays within the specification?

That seems fine to me. prepend/append are rarely used, so there doesn't seem a need to make this really flexible.

Is there ever a situation in which broadcasting would make sense?

I don't see it in SciPy, nor can I think of a real need for this.

Or should we limit portable behavior to floating-point (real and complex) and signed integers?

This sounds good to me.

rgommers avatar Apr 04 '24 17:04 rgommers