csr icon indicating copy to clipboard operation
csr copied to clipboard

Syntax alignment with numpy

Open fabian-sp opened this issue 3 years ago • 6 comments

Hi, I just found this repository as I need to use a sparse data format inside numba functions. In my exisiting code using numpy arrays I mainly need row indexing and dot products.

Is there a possibility to use csr with the same syntax for these functions? So having sth like A[i,:] which works when A would be a standard numpy array but also when A is a csr.CSR object? This would help me to avoid a lot of duplicate code.

Thank you for your help!

fabian-sp avatar Jul 12 '22 07:07 fabian-sp

Thanks for your interest in the package, and sorry for missing this issue in my notifications! I don't see any reason why that wouldn't be possible - just a matter of implementing it. If I have time I might look at it; also happy to review a pull request.

mdekstrand avatar Jul 27 '22 15:07 mdekstrand

I should clarify a bit further. It is definitely possible to have this available in Python functions. I do not know if it will be possible to make that syntax available in Numba-compiled functions - will need to review Numba's support for overloading magic methods. I see that Numba 0.56 added that support for jitclasses, but CSR does not use a jitclass (because jitclasses weren't stable enough and didn't support everything I needed).

mdekstrand avatar Jul 27 '22 15:07 mdekstrand

Hi, thank your for your reply. So I have worked on this a bit in the meantime and what I basically would need is a functionality that gives me a subset of rows, sth like A[S,:] where S is a list of integers. I currently do this with your .row function and looping over S but I guess that this is not very efficient. I later on need to do only matrix-vector products with A[S,:].

fabian-sp avatar Jul 28 '22 15:07 fabian-sp

Thanks! Looks like we need support for a few cases:

  • Single integer (easy, but unclear what return type should be - should it be a dense vector or a sparse one?)
  • Slice (logic already exists, but isn't exposed through the index API)
  • Sequence of indices (needs a loop as you say; this can be efficiently implemented in Numba)
  • Sequence of booleans (will also need a loop)

I think that covers all the common indexers.

mdekstrand avatar Jul 28 '22 17:07 mdekstrand

@fabian-sp There are now row functions that support arrays of row indices, in the new 0.5 release that will come out shortly.

mdekstrand avatar May 23 '23 20:05 mdekstrand

Specifically, .row() now takes an ndarray.

mdekstrand avatar May 23 '23 20:05 mdekstrand