RFC: add `diff` for computing the n-th forward difference
This RFC proposes the addition of a new API in the array API specification for computing the n-th forward difference along a specified dimension.
Overview
Based on array comparison data, the API is available across all major array libraries in the PyData ecosystem.
diff was originally discussed in https://github.com/data-apis/array-api/issues/187 as a potential standardization candidate and has been requested by downstream libraries, such as xarray.
Prior art
- NumPy: https://numpy.org/doc/stable/reference/generated/numpy.diff.html
-
appendandprependdefaults to<no_value>.
-
- CuPy: https://docs.cupy.dev/en/stable/reference/generated/cupy.diff.html
- Dask: https://docs.dask.org/en/latest/generated/dask.array.diff.html
- PyTorch: https://pytorch.org/docs/stable/generated/torch.diff.html
-
prependandappendmust be tensors.
-
- JAX: https://jax.readthedocs.io/en/latest/_autosummary/jax.numpy.diff.html
- TensorFlow: https://www.tensorflow.org/api_docs/python/tf/experimental/numpy/diff
- Does not support
prependandappendkwargs.
- Does not support
Proposal
def diff(x: array, /, *, n: int = 1, axis: int = -1, prepend = Optional[array], append = Optional[array]) -> array
Questions
- NumPy supports
prependandappendas scalar values and subsequently wraps usingasarray. Are we okay limiting to only arrays within the specification? Libraries, such as NumPy, would be free to accept scalars; this would just not be considered portable behavior. - Apart from scalars, NumPy requires that
prependandappendarrays match the shape ofxexcept along the specified axis, thus precluding broadcasting. Is there ever a situation in which broadcasting would make sense? - The output array must have the same dtype as the input array. Consequently, when
xhas a boolean dtype, the output array must also have a boolean dtype. Similarly, unsigned integer input arrays result in unsigned integer output arrays. Are we okay requiring thatdiffsupport boolean and unsigned integer dtypes? Or should we limit portable behavior to floating-point (real and complex) and signed integers?
Are we okay limiting to only arrays within the specification?
That seems fine to me. prepend/append are rarely used, so there doesn't seem a need to make this really flexible.
Is there ever a situation in which broadcasting would make sense?
I don't see it in SciPy, nor can I think of a real need for this.
Or should we limit portable behavior to floating-point (real and complex) and signed integers?
This sounds good to me.