scipy BUG: sparse: toarray() fails when the dtype is np.float16

For example:

In [11]: import numpy as np

In [12]: from scipy.sparse import csr_matrix, coo_matrix

In [13]: csr_matrix([[0, 1]], dtype=np.float16).toarray()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-4ee40d9bebcf> in <module>()
----> 1 csr_matrix([[0, 1]], dtype=np.float16).toarray()

/Users/warren/miniconda3scipy/lib/python3.5/site-packages/scipy-1.0.0.dev0+bad9919-py3.5-macosx-10.6-x86_64.egg/scipy/sparse/compressed.py in toarray(self, order, out)
    946             y = out.T
    947         M, N = x._swap(x.shape)
--> 948         _sparsetools.csr_todense(M, N, x.indptr, x.indices, x.data, y)
    949         return out
    950 

ValueError: Output dtype not compatible with inputs.

In [14]: coo_matrix([[0, 1]], dtype=np.float16).toarray()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-16766815b9a5> in <module>()
----> 1 coo_matrix([[0, 1]], dtype=np.float16).toarray()

/Users/warren/miniconda3scipy/lib/python3.5/site-packages/scipy-1.0.0.dev0+bad9919-py3.5-macosx-10.6-x86_64.egg/scipy/sparse/coo.py in toarray(self, order, out)
    256         M,N = self.shape
    257         coo_todense(M, N, self.nnz, self.row, self.col, self.data,
--> 258                     B.ravel('A'), fortran)
    259         return B
    260 

ValueError: Output dtype not compatible with inputs.

Scipy/Numpy/Python version information:

In [15]: import sys, scipy, numpy; print(scipy.__version__, numpy.__version__, sys.version_info)
1.0.0.dev0+bad9919 1.12.1 sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)

Cf. https://github.com/scipy/scipy/issues/2481.

May 16 '17 15:05 WarrenWeckesser

I think float16 is not a supported sparse matrix type. This is however not checked at construction time.

May 16 '17 15:05 pv

float16 is being used "in the wild". I ran into this issue while investigating http://stackoverflow.com/questions/44003497/large-memory-usage-of-scipy-sparse-csr-matrix-toarray. I guess we should check at construction to avoid surprises in later operations.

May 16 '17 15:05 WarrenWeckesser

The simple fix is to add a float16 type to T_TYPES here: https://github.com/scipy/scipy/blob/master/scipy/sparse/generate_sparsetools.py#L137

Of course, every type in this list expands to several functions (for all pairs of I_TYPES, T_TYPES, and generic functions), and so adding a new one will bloat the sparsetools shared library.

If we weren't using the values at all, we could use void-pointer tricks (like dist_to_squareform_from_vector_generic) to collapse a bunch of T_TYPES together, but the xxx_todense functions are actually A += X, so no luck there.

May 16 '17 16:05 perimosocordiae

It's not enough: you also need to implement the c++ type for half-float, because it is not a native C++ type.

May 16 '17 16:05 pv

Ah, I figured npy_half would work, but I see now that it won't: https://github.com/numpy/numpy/blob/c90d7c94fd2077d0beca48fa89a423da2b0bb663/doc/source/reference/c-api.coremath.rst#half-precision-functions

May 16 '17 17:05 perimosocordiae

Temporary solution. Convert it to float32 then use toarray or todense. After that use astype to convert back to float16

>>> import numpy as np
>>> from scipy.sparse import csr_matrix, coo_matrix
>>> csr_matrix([[0, 1]], dtype=np.float16).astype(np.float32).toarray().astype(np.float16)
array([[0., 1.]], dtype=float16)

Oct 03 '19 06:10 ethaizone

Actually, if I'm not supposed to use 'float16', I will be troubled with another problem. MemoryError: Unable to allocate 32.5 GiB for an array with shape (34804, 250598) and data type float32 Just using 'float32' will cause such a error, let alone 'float64'.

Aug 07 '22 15:08 InkyZigi

I can also replicate the bug / issue.

Feb 15 '23 12:02 TJKlein

As pointed out by NumPy's doc shared @perimosocordiae in https://github.com/scipy/scipy/issues/7408#issuecomment-301863708:

The header file <numpy/halffloat.h> provides functions to work with IEEE 754-2008 16-bit floating point values. While this format is not typically used for numerical computations, it is useful for storing values that require floating point but do not need much precision. It can also be used as an educational tool to understand the nature of floating point round-off error.

[…]

For these reasons, NumPy provides an API to work with npy_half values accessible by including <numpy/halffloat.h> and linking to npymath. For functions that are not provided directly, such as the arithmetic operations, the preferred method is to convert to float or double and back again, as in the following example.

SciPy sparse matrices and arrays must probably not support np.float16 since numerical operations aren't implemented for those types on most compilers.

There might be support for np.float16 via ufuncs like NumPy does (which would involve upcasting to another dtype) but this is significant work.

May 26 '23 18:05 jjerphan

We really shouldn't do sparse array operations with float16 it makes things just complicated and adds lots of if float16 then raise notimplementederror guards for no good reason.

May 26 '23 18:05 ilayn

Thank you for the fantastic scipy.sparse implementation. Much appreciated.

I would politely argue there are reasons to support float16, albeit in recognition of the code/support complexities that render it not currently implementable. So perhaps as an enhancement request.

For example, a neuroscience application dealing with a rank 150k matrix (2E10 elements) that is amenable to compression since most of the neural activations are small, and, represented as Pearson correlation coefficients where E-8 precision is more than adequate. Here, the compressed npz file of the dense matrix is reduced in size by more than an order of magnitude using scipy.sparse and float16. Using float32 doubles the file size, but there in no utility in representing a correlation to E-45. The disk size argument also applies to memory, which is potentially limited on certain platforms.

Another aspect is that half-precision is not uncommon in machine learning applications where half-precision operations are sometimes hardware accelerated and well suited to arithmetic operations on weights not needing E-45 precision.

-- Just some food for thought and my 2¢.

Question: I ran across as an Exception at csc_array(dtype=float16).toarray(). If I do not call .toarray(), there is no Exception. So it seems float16 is actually supported within scipy.sparse?

  File "./Sparsify.py", line 164, in main
    denseBlockArray = blockArray.toarray( order = 'C' )
  File ".local/lib/python3.10/site-packages/scipy/sparse/_compressed.py", line 1062, in toarray
    csr_todense(M, N, x.indptr, x.indices, x.data, y)
ValueError: Output dtype not compatible with inputs.

Jun 27 '23 15:06 SoftwareLiteracy

Per your question: only the functionality that touches native code will fail when using an unsupported dtype. This accounts for most of the useful bits of scipy.sparse, however, so I wouldn't say float16 is supported in any real sense.

Jul 19 '23 02:07 perimosocordiae