flow_stability
flow_stability copied to clipboard
Type declaration for `indices` and `indptr` might lead to dtype mismatch
When passing the arguments of a scipy.sparse.csr_matrix to sparse_stoch_from_full_csr we run into a ValueError:
long long[:] Tf_indices, E ValueError: Buffer dtype mismatch, expected 'long long' but got 'int'
See e.g. here
We can address this by explicitly converting the indices to np.int64 when passing them as arguments, or we might change datat type to int instead of long long in sparse_stochfrom_full_csr (and others).
Note that for some functions in _cython_sparse_stoch.pyx already declare int[:] as data type when indices are passed as arguments.
We have some issues with the type declarations for `indices` and `indptr`:
https://github.com/alexbovet/flow_stability/actions/runs/10513029536/job/29127746857#step:5:116
It also seems not completely consistent in the defs:
E.g. for indptr some functions require int:
https://github.com/alexbovet/flow_stability/blob/8648a8bd715e3b086ed13cb7453d41754c176398/src/flowstab/_cython_sparse_stoch.pyx#L859-L863
some long long:
https://github.com/alexbovet/flow_stability/blob/8648a8bd715e3b086ed13cb7453d41754c176398/src/flowstab/_cython_sparse_stoch.pyx#L613-L618
I'd put everything indices and indptr to long and declare with np.int64 on the python site.
@alexbovet is that OK for you or is there actually a use-case for long long?
Originally posted by @j-i-l in https://github.com/alexbovet/flow_stability/issues/35#issuecomment-2305337823
https://github.com/alexbovet/flow_stability/pull/35#issuecomment-2306467560
We might want to add the possibility to install flowstab in "large" data mode where we consistently use long long for all index arrays.
Yes, that could be a good option.
I remember there was an issue with the connected_components function from scipy.csgraph which cannot use int64 (here. Something to keep in mind.
Also, I think this is a discussion they have at scipy, and other projects.
In C++ we could simply use template functions which would allow us not having to specify the type of indices and indptr.
I'm unsure if/how this can be done in cython...
Actually, scipy as nice examples of this approach in their sparsetools:
https://github.com/scipy/scipy/blob/9e93f673c4021cc6dc38487dbe8fb2db8cadf131/scipy/sparse/sparsetools/csr.h#L103-L116
That might be a good approach for us: We would only need to set the type in the numpy representation, i.e. np.int32 or np.int64 and the rest would follow... well that would be the idea.