flow_stability Type declaration for `indices` and `indptr` might lead to dtype mismatch

When passing the arguments of a scipy.sparse.csr_matrix to sparse_stoch_from_full_csr we run into a ValueError:

long long[:] Tf_indices, E ValueError: Buffer dtype mismatch, expected 'long long' but got 'int'

See e.g. here

We can address this by explicitly converting the indices to np.int64 when passing them as arguments, or we might change datat type to int instead of long long in sparse_stochfrom_full_csr (and others).

Note that for some functions in _cython_sparse_stoch.pyx already declare int[:] as data type when indices are passed as arguments.

          We have some issues with the type declarations for `indices` and `indptr`:

https://github.com/alexbovet/flow_stability/actions/runs/10513029536/job/29127746857#step:5:116

It also seems not completely consistent in the defs:

E.g. for indptr some functions require int:

https://github.com/alexbovet/flow_stability/blob/8648a8bd715e3b086ed13cb7453d41754c176398/src/flowstab/_cython_sparse_stoch.pyx#L859-L863

some long long: https://github.com/alexbovet/flow_stability/blob/8648a8bd715e3b086ed13cb7453d41754c176398/src/flowstab/_cython_sparse_stoch.pyx#L613-L618

I'd put everything indices and indptr to long and declare with np.int64 on the python site. @alexbovet is that OK for you or is there actually a use-case for long long?

Originally posted by @j-i-l in https://github.com/alexbovet/flow_stability/issues/35#issuecomment-2305337823

Sep 02 '24 11:09 j-i-l

https://github.com/alexbovet/flow_stability/pull/35#issuecomment-2306467560

We might want to add the possibility to install flowstab in "large" data mode where we consistently use long long for all index arrays.

Sep 02 '24 11:09 j-i-l

Yes, that could be a good option.

I remember there was an issue with the connected_components function from scipy.csgraph which cannot use int64 (here. Something to keep in mind.

Also, I think this is a discussion they have at scipy, and other projects.

Sep 02 '24 19:09 alexbovet

In C++ we could simply use template functions which would allow us not having to specify the type of indices and indptr.

I'm unsure if/how this can be done in cython...

Actually, scipy as nice examples of this approach in their sparsetools:

https://github.com/scipy/scipy/blob/9e93f673c4021cc6dc38487dbe8fb2db8cadf131/scipy/sparse/sparsetools/csr.h#L103-L116

That might be a good approach for us: We would only need to set the type in the numpy representation, i.e. np.int32 or np.int64 and the rest would follow... well that would be the idea.

Sep 16 '24 18:09 j-i-l

flow_stability flow_stability copied to clipboard

Type declaration for `indices` and `indptr` might lead to dtype mismatch

flow_stability
flow_stability copied to clipboard