dask icon indicating copy to clipboard operation
dask copied to clipboard

Sparse array reductions

Open ian-r-rose opened this issue 3 years ago • 3 comments

This avoids creating dense auxiliary sparse.COO arrays in reductions, which will in general not have the same fill_value as actually sparse arrays, preventing basic arithmetic between them.

This works, but I'm a bit worried that it will cause problems for other array implementations like cupy, and it may be better to go through the dispatch mechanism.

  • [x] Closes #7169
  • [x] Tests added / passed
  • [x] Passes pre-commit run --all-files

ian-r-rose avatar Aug 02 '22 20:08 ian-r-rose

Okay, as I feared, this does indeed make cupy unhappy -- I'll work on pushing some of this into the dispatch system

ian-r-rose avatar Aug 02 '22 22:08 ian-r-rose

Thanks for the ping @ian-r-rose . As you noted yourself, unfortunately CuPy Sparse is largely uncovered at the moment, I may be mistaken but I think it's usage is fairly limited today, which is also why it didn't get much attention lately.

In briefly trying to increase coverage, I found that Dask arrays backed by cupyx.scipy.sparse.coo_matrix don't respect the chunktype, even if we pass meta=cupyx.scipy.sparse.coo_matrix((0,0)), and CSR matrices don't seem to respect chunktypes for matrices larger than 2 dimensions. I don't have enough bandwidth now to look further, @jakirkham is this something you would be interested/have bandwidth to look at? If not, then I'd suggest this PR may go in even without further CuPy testing.

pentschev avatar Aug 03 '22 17:08 pentschev

It looks like this also closes https://github.com/dask/dask/issues/8280?

I haven't really tackled that issue here -- it seems more involved and may require actual work on the algorithm (based on the WIP commit linked by the user)

ian-r-rose avatar Aug 08 '22 21:08 ian-r-rose

Just added one more commit adding some additional test coverage, in case you have a few minutes @jrbourbeau

ian-r-rose avatar Aug 10 '22 22:08 ian-r-rose

Sorry for the lack of reply here. Responded in one thread above where I was pinged.

IIUC the other question is whether we want to support sparse CuPy matrices with numel? As CuPy sparse matrices are fairly similar to their SciPy sparse matrix counterparts, would ask whether SciPy sparse matrices are supported? If not, then wouldn't worry about CuPy. If someone asks about numel for sparse matrices, we can worry about it then :)

jakirkham avatar Aug 12 '22 06:08 jakirkham