array-api icon indicating copy to clipboard operation
array-api copied to clipboard

RFC: add `count_nonzero` for counting the number of "non-zero" values

Open kgryte opened this issue 1 year ago • 7 comments

This RFC proposes a new addition to the array API specification for counting the number of "non-zero" (i.e., truthy) values in an array.

Overview

Based on array comparison data, the API is available across all major array libraries in the PyData ecosystem.

count_nonzero was originally identified in https://github.com/data-apis/array-api/issues/187 as a potential standardization candidate and has usage within downstream libraries (e.g., sklearn, SciPy).

Prior art

Proposal

def count_nonzero(x: array, /, *, axis: Optional[Union[int, Tuple[int, ...]]] = None, keepdims: bool = False) -> array
  • When axis is None, the function should count the number of non-zero elements along a flattened array.
  • The function should return an array having the default index data type.

Questions

  • In contrast to sum and other reductions, support for keepdims is less common among array libraries. Why this is the case is not clear. Are there any reasons why keepdims should not be standardized?

kgryte avatar Apr 18 '24 10:04 kgryte

@asmeurer can you tell us if it's easy to work around a missing keepdims keyword in array-api-compat?

rgommers avatar Apr 18 '24 16:04 rgommers

One other nice thing is that unlike nonzero, this function does not have a data-dependent output shape. So aside from performance, it can be supported by implementations that may not support nonzero.

rgommers avatar Apr 18 '24 16:04 rgommers

The keepdims argument was added fairly late (2020) in numpy: https://github.com/numpy/numpy/pull/15870. So it may have simply been overlooked by other libraries. Probably just a low-prio feature (also no usages in scipy at all).

rgommers avatar Apr 18 '24 16:04 rgommers

I think so. Isn't it just a matter of calling expand_dims? Maybe https://github.com/data-apis/array-api/issues/760 would help.

asmeurer avatar Apr 18 '24 17:04 asmeurer

To reiterate what I said at the meeting today, count_nonzero is nice because the standard doesn't support calling sum() on a boolean array, so count_nonzero is the idiomatic way to get the number of True elements in a bool array.

asmeurer avatar Apr 18 '24 19:04 asmeurer

Thanks! SGTM then to add count_nonzero. And add keepdims for design consistency with other reductions.

rgommers avatar Apr 18 '24 19:04 rgommers

PR is up: https://github.com/data-apis/array-api/pull/803

kgryte avatar May 02 '24 06:05 kgryte