RFC: add `count_nonzero` for counting the number of "non-zero" values
This RFC proposes a new addition to the array API specification for counting the number of "non-zero" (i.e., truthy) values in an array.
Overview
Based on array comparison data, the API is available across all major array libraries in the PyData ecosystem.
count_nonzero was originally identified in https://github.com/data-apis/array-api/issues/187 as a potential standardization candidate and has usage within downstream libraries (e.g., sklearn, SciPy).
Prior art
- NumPy: https://numpy.org/doc/stable/reference/generated/numpy.count_nonzero.html
- CuPy: https://docs.cupy.dev/en/stable/reference/generated/cupy.count_nonzero.html
- does not support
keepdims.
- does not support
- Dask: https://docs.dask.org/en/stable/generated/dask.array.count_nonzero.html
- does not support
keepdims.
- does not support
- JAX: https://jax.readthedocs.io/en/latest/_autosummary/jax.numpy.count_nonzero.html
- PyTorch: https://pytorch.org/docs/stable/generated/torch.count_nonzero.html
- does not support
keepdimsand was not discussed upon initial addition (ref).
- does not support
- TensorFlow: https://www.tensorflow.org/api_docs/python/tf/math/count_nonzero
Proposal
def count_nonzero(x: array, /, *, axis: Optional[Union[int, Tuple[int, ...]]] = None, keepdims: bool = False) -> array
- When
axisisNone, the function should count the number of non-zero elements along a flattened array. - The function should return an array having the default index data type.
Questions
- In contrast to
sumand other reductions, support forkeepdimsis less common among array libraries. Why this is the case is not clear. Are there any reasons whykeepdimsshould not be standardized?
@asmeurer can you tell us if it's easy to work around a missing keepdims keyword in array-api-compat?
One other nice thing is that unlike nonzero, this function does not have a data-dependent output shape. So aside from performance, it can be supported by implementations that may not support nonzero.
The keepdims argument was added fairly late (2020) in numpy: https://github.com/numpy/numpy/pull/15870. So it may have simply been overlooked by other libraries. Probably just a low-prio feature (also no usages in scipy at all).
I think so. Isn't it just a matter of calling expand_dims? Maybe https://github.com/data-apis/array-api/issues/760 would help.
To reiterate what I said at the meeting today, count_nonzero is nice because the standard doesn't support calling sum() on a boolean array, so count_nonzero is the idiomatic way to get the number of True elements in a bool array.
Thanks! SGTM then to add count_nonzero. And add keepdims for design consistency with other reductions.
PR is up: https://github.com/data-apis/array-api/pull/803