RFC: add `diff` for computing the n-th forward difference

Open kgryte opened this issue 1 year ago • 1 comments

This RFC proposes the addition of a new API in the array API specification for computing the n-th forward difference along a specified dimension.

Overview

Based on array comparison data, the API is available across all major array libraries in the PyData ecosystem.

diff was originally discussed in https://github.com/data-apis/array-api/issues/187 as a potential standardization candidate and has been requested by downstream libraries, such as xarray.

Prior art

NumPy: https://numpy.org/doc/stable/reference/generated/numpy.diff.html
- append and prepend defaults to <no_value>.
CuPy: https://docs.cupy.dev/en/stable/reference/generated/cupy.diff.html
Dask: https://docs.dask.org/en/latest/generated/dask.array.diff.html
PyTorch: https://pytorch.org/docs/stable/generated/torch.diff.html
- prepend and append must be tensors.
JAX: https://jax.readthedocs.io/en/latest/_autosummary/jax.numpy.diff.html
TensorFlow: https://www.tensorflow.org/api_docs/python/tf/experimental/numpy/diff
- Does not support prepend and append kwargs.

Proposal

def diff(x: array, /, *, n: int = 1, axis: int = -1, prepend = Optional[array], append = Optional[array]) -> array

Questions

NumPy supports prepend and append as scalar values and subsequently wraps using asarray. Are we okay limiting to only arrays within the specification? Libraries, such as NumPy, would be free to accept scalars; this would just not be considered portable behavior.
Apart from scalars, NumPy requires that prepend and append arrays match the shape of x except along the specified axis, thus precluding broadcasting. Is there ever a situation in which broadcasting would make sense?
The output array must have the same dtype as the input array. Consequently, when x has a boolean dtype, the output array must also have a boolean dtype. Similarly, unsigned integer input arrays result in unsigned integer output arrays. Are we okay requiring that diff support boolean and unsigned integer dtypes? Or should we limit portable behavior to floating-point (real and complex) and signed integers?

Apr 04 '24 09:04 kgryte

Are we okay limiting to only arrays within the specification?

That seems fine to me. prepend/append are rarely used, so there doesn't seem a need to make this really flexible.

Is there ever a situation in which broadcasting would make sense?

I don't see it in SciPy, nor can I think of a real need for this.

Or should we limit portable behavior to floating-point (real and complex) and signed integers?

This sounds good to me.

Apr 04 '24 17:04 rgommers