sparse
sparse copied to clipboard
Can't take max of arrays at least as large as 2 ** 32
Describe the bug
Calling sparse.COO.max
on an array larger than 2 ** 32 - 1 fails a TypeError
like so:
>>> a.shape
(4294967296,)
>>> a.max()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\<path_redacted>\sparse\_sparse_array.py", line 444, in max
return np.maximum.reduce(self, out=out, axis=axis, keepdims=keepdims)
File "C:\<path_redacted>\sparse\_sparse_array.py", line 307, in __array_ufunc__
result = SparseArray._reduce(ufunc, *inputs, **kwargs)
File "C:\<path_redacted>\sparse\_sparse_array.py", line 278, in _reduce
return self.reduce(method, **kwargs)
File "C:\<path_redacted>\sparse\_sparse_array.py", line 360, in reduce
out = self._reduce_calc(method, axis, keepdims, **kwargs)
File "C:\<path_redacted>\sparse\_coo\core.py", line 692, in _reduce_calc
data, inv_idx, counts = _grouped_reduce(a.data, a.coords[0], method, **kwargs)
File "C:\<path_redacted>\sparse\_coo\core.py", line 1566, in _grouped_reduce
result = method.reduceat(x, inv_idx, **kwargs)
TypeError: Cannot cast array data from dtype('uint64') to dtype('int64') according to the rule 'safe'
To Reproduce
Create an array a
at least as large as 2 ** 32 with at least one nonzero element, then call a.max()
. For example:
>>> b = sparse.DOK((2 ** 32,))
>>> b[0] = 1
>>> a = sparse.COO(b)
>>> a.nnz
1
>>> a.max() # TypeError
Expected behavior Return the maximum value of the array (1 in the example above).
System
- OS and version: Windows 10
-
sparse
version: 0.12.0+44.g765e297 (bug is also present in 0.12.0, installed from pip) - NumPy version: 1.18.5
- Numba version: 0.53.1
Additional context
sparse.COO.max
works on an array of size 2 ** 32 if it is empty (i.e. a.nnz == 0
).
Are you on 32-bit Windows by any chance?
I'm on 64-bit Windows.
I just checked and this bug is not present on Manjaro 21.0.7 with Linux 5.12.9-1-MANJARO (x86_64).
Mentoring instructions: Replace all uses of np.[as]array(list)
with np.[as]array(list, dtype=np.int64)
.
Hello, I ran into the same problem. Was there any solution to this?
A quick update since I'm now digging into the library. I see that there is an idx_dtype
parameter for the constructor of COO
that -I believe- should force COO to use a specific type as index format. However, if data
is None in the constructor's call the array is converted via as_coo
, which in turn relies on DOK's as_format
, which here calls COO.from_iter
, which doesn't take the idx_dtype
and doesn't forward it to the final call to COO's constructor here.
The result is, effectively, that idx_dtype
gets ignored.
A proposal for improving this would be:
-
as_coo
should takeidx_dtype
(and possibly more parameters of the constructor, maybe directly**kwargs
?) anf forward them down as appropriate. -
as_format
should take**kwargs
and should forward them to whichever constructor/factory it uses internally -
from_iter
should take**kwargs
and forward them to theCOO
constructor.
I don't know which, if any, parameter combinations should be forbidden to ensure there is no infinite recursion in the constructor, but I believe someone with more knowledge of the codebase might know what and where to check so this doesn't happen.
I traced the issue to its source and came up with a hack to make this work, should anyone else also run into this problem.
Basically, when this reshape is called, because idx_type
is ignored, as mentioned in the comment above, it uses the default int32
idx_type
. Since in32
can't store the new shape, this test checks positive and idx_type
gets converted to the result of np.min_scalar_type(max(shape))
, which is np.uint64
and that's what causes the problem.
My hack to solve this is to hardcode np.int64
instead of letting numpy choose:
idx_type = np.int64
This solves the problem when calling max()
.
Thanks @GPhilo for digging into this, I'll try to set some time aside this weekend to fix it and cut a release.
It has been more than 2 years and this issue seems still exists. Any update on this?
This doesn't happen anymore on sparse
0.15.1, which is the latest release. Closing.