BUG: sparse: toarray() fails when the dtype is np.float16
For example:
In [11]: import numpy as np
In [12]: from scipy.sparse import csr_matrix, coo_matrix
In [13]: csr_matrix([[0, 1]], dtype=np.float16).toarray()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-4ee40d9bebcf> in <module>()
----> 1 csr_matrix([[0, 1]], dtype=np.float16).toarray()
/Users/warren/miniconda3scipy/lib/python3.5/site-packages/scipy-1.0.0.dev0+bad9919-py3.5-macosx-10.6-x86_64.egg/scipy/sparse/compressed.py in toarray(self, order, out)
946 y = out.T
947 M, N = x._swap(x.shape)
--> 948 _sparsetools.csr_todense(M, N, x.indptr, x.indices, x.data, y)
949 return out
950
ValueError: Output dtype not compatible with inputs.
In [14]: coo_matrix([[0, 1]], dtype=np.float16).toarray()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-16766815b9a5> in <module>()
----> 1 coo_matrix([[0, 1]], dtype=np.float16).toarray()
/Users/warren/miniconda3scipy/lib/python3.5/site-packages/scipy-1.0.0.dev0+bad9919-py3.5-macosx-10.6-x86_64.egg/scipy/sparse/coo.py in toarray(self, order, out)
256 M,N = self.shape
257 coo_todense(M, N, self.nnz, self.row, self.col, self.data,
--> 258 B.ravel('A'), fortran)
259 return B
260
ValueError: Output dtype not compatible with inputs.
Scipy/Numpy/Python version information:
In [15]: import sys, scipy, numpy; print(scipy.__version__, numpy.__version__, sys.version_info)
1.0.0.dev0+bad9919 1.12.1 sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)
Cf. https://github.com/scipy/scipy/issues/2481.
I think float16 is not a supported sparse matrix type. This is however not checked at construction time.
float16 is being used "in the wild". I ran into this issue while investigating http://stackoverflow.com/questions/44003497/large-memory-usage-of-scipy-sparse-csr-matrix-toarray. I guess we should check at construction to avoid surprises in later operations.
The simple fix is to add a float16 type to T_TYPES here: https://github.com/scipy/scipy/blob/master/scipy/sparse/generate_sparsetools.py#L137
Of course, every type in this list expands to several functions (for all pairs of I_TYPES, T_TYPES, and generic functions), and so adding a new one will bloat the sparsetools shared library.
If we weren't using the values at all, we could use void-pointer tricks (like dist_to_squareform_from_vector_generic) to collapse a bunch of T_TYPES together, but the xxx_todense functions are actually A += X, so no luck there.
It's not enough: you also need to implement the c++ type for half-float, because it is not a native C++ type.
Ah, I figured npy_half would work, but I see now that it won't: https://github.com/numpy/numpy/blob/c90d7c94fd2077d0beca48fa89a423da2b0bb663/doc/source/reference/c-api.coremath.rst#half-precision-functions
Temporary solution. Convert it to float32 then use toarray or todense. After that use astype to convert back to float16
>>> import numpy as np
>>> from scipy.sparse import csr_matrix, coo_matrix
>>> csr_matrix([[0, 1]], dtype=np.float16).astype(np.float32).toarray().astype(np.float16)
array([[0., 1.]], dtype=float16)
Actually, if I'm not supposed to use 'float16', I will be troubled with another problem.
MemoryError: Unable to allocate 32.5 GiB for an array with shape (34804, 250598) and data type float32
Just using 'float32' will cause such a error, let alone 'float64'.
I can also replicate the bug / issue.
As pointed out by NumPy's doc shared @perimosocordiae in https://github.com/scipy/scipy/issues/7408#issuecomment-301863708:
The header file <numpy/halffloat.h> provides functions to work with IEEE 754-2008 16-bit floating point values. While this format is not typically used for numerical computations, it is useful for storing values that require floating point but do not need much precision. It can also be used as an educational tool to understand the nature of floating point round-off error.
[…]
For these reasons, NumPy provides an API to work with npy_half values accessible by including <numpy/halffloat.h> and linking to npymath. For functions that are not provided directly, such as the arithmetic operations, the preferred method is to convert to float or double and back again, as in the following example.
SciPy sparse matrices and arrays must probably not support np.float16 since numerical operations aren't implemented for those types on most compilers.
There might be support for np.float16 via ufuncs like NumPy does (which would involve upcasting to another dtype) but this is significant work.
We really shouldn't do sparse array operations with float16 it makes things just complicated and adds lots of if float16 then raise notimplementederror guards for no good reason.
Thank you for the fantastic scipy.sparse implementation. Much appreciated.
I would politely argue there are reasons to support float16, albeit in recognition of the code/support complexities that render it not currently implementable. So perhaps as an enhancement request.
For example, a neuroscience application dealing with a rank 150k matrix (2E10 elements) that is amenable to compression since most of the neural activations are small, and, represented as Pearson correlation coefficients where E-8 precision is more than adequate. Here, the compressed npz file of the dense matrix is reduced in size by more than an order of magnitude using scipy.sparse and float16. Using float32 doubles the file size, but there in no utility in representing a correlation to E-45. The disk size argument also applies to memory, which is potentially limited on certain platforms.
Another aspect is that half-precision is not uncommon in machine learning applications where half-precision operations are sometimes hardware accelerated and well suited to arithmetic operations on weights not needing E-45 precision.
-- Just some food for thought and my 2¢.
Question: I ran across as an Exception at csc_array(dtype=float16).toarray(). If I do not call .toarray(), there is no Exception. So it seems float16 is actually supported within scipy.sparse?
File "./Sparsify.py", line 164, in main
denseBlockArray = blockArray.toarray( order = 'C' )
File ".local/lib/python3.10/site-packages/scipy/sparse/_compressed.py", line 1062, in toarray
csr_todense(M, N, x.indptr, x.indices, x.data, y)
ValueError: Output dtype not compatible with inputs.
Per your question: only the functionality that touches native code will fail when using an unsupported dtype. This accounts for most of the useful bits of scipy.sparse, however, so I wouldn't say float16 is supported in any real sense.