zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Interest in a zarr.sparse module?

Open daletovar opened this issue 5 years ago • 5 comments

Hey there, For a project I've been working on I wanted a zarr-based sparse matrix class so I recently made one: https://github.com/daletovar/zsparse

I've added a notebook with a few examples. After it gets to a more stable and faster place I was planning on making it a stand alone package. However, I've been thinking it might make sense to just add it to zarr. I won't be offended if you guys aren't interested. At the very least, I thought you guys might like to know about it, especially because it solves #152.

Right now there's support for csr and csc matrices and saving and loading pydata/sparse arrays. A potential problem with making a COO class for pydata/sparse is that doing a large number of binary searches on zarr arrays takes much longer than it does for numpy arrays. The code would also need to be written in cython instead of numba because numba doesn't support zarr. I'd like to see how cython does on the csr and csc classes as it's all currently written in pure python. All of this is to say, if you were wondering why there isn't a COO class, these are some of the concerns I've had.

Thanks for listening. I'm curious what you guys think about all of this.

daletovar avatar Apr 01 '19 06:04 daletovar

Hi Dale, demo notebook is very cool, thanks a lot for posting. I'm on leave for a couple of weeks but look forward to digging a bit deeper.

On Mon, 1 Apr 2019, 14:21 Dale Tovar, [email protected] wrote:

Hey there, For a project I've been working on I wanted a zarr-based sparse matrix class so I recently made one: https://github.com/daletovar/zsparse

I've added a notebook with a few examples. After it gets to a more stable and faster place I was planning on making it a stand alone package. However, I've been thinking it might make sense to just add it to zarr. I won't be offended if you guys aren't interested. At the very least, I thought you guys might like to know about it, especially because it solves #152 https://github.com/zarr-developers/zarr/issues/152.

Right now there's support for csr and csc matrices and saving and loading pydata/sparse arrays. A potential problem with making a COO class for pydata/sparse is that doing a large number of binary searches on zarr arrays takes much longer than it does for numpy arrays. The code would also need to be written in cython instead of numba because numba doesn't support zarr. I'd like to see how cython does on the csr and csc classes as it's all currently written in pure python. All of this is to say, if you were wondering why there isn't a COO class, these are some of the concerns I've had.

Thanks for listening. I'm curious what you guys think about all of this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/zarr/issues/424, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8Qs4zFK9aVFEo7TEV4scZh2oumsT_ks5vcaVzgaJpZM4cU8Do .

alimanfoo avatar Apr 01 '19 13:04 alimanfoo

Thanks, I appreciate that.

daletovar avatar Apr 03 '19 02:04 daletovar

As the only open issue I could find about storing sparse arrays in Zarr, I thought I'd comment here that the AnnData project's .h5ad file documentation claims that "Sparse arrays don’t have a native representations in HDF5 or Zarr, so we've defined our own". It may be worth extending the Zarr spec to formalize how sparse arrays are stored in Zarr. My apologies if that's already been done.

hammer avatar Oct 24 '20 12:10 hammer

Now that we have the meta_array option (see https://github.com/zarr-developers/zarr-python/pull/934), which allows loading of data to different array types, it should be more straightforward to implement some sort of sparse support (i.e. meta_array=sparse.SparseArray). We would just need to pick an on-disk storage format, which could potentially be implemented as a numcodecs codec.

Perhaps now is the time to revisit this feature.

cc @alxmrs

rabernat avatar Oct 07 '22 13:10 rabernat

cc @ivirshup (who has also expressed interest in some form of sparse support in Zarr)

jakirkham avatar Oct 07 '22 15:10 jakirkham