Moving the ZarrArray class and related stuff to their own package?
Hi @grimbough @Artur-man,
How do you guys feel about moving the ZarrArray class and related stuff to their own package? IMO implementing a DelayedArray extension for a given backend is kind of a separate business from implementing the lower-level IO functionality. In my experience, it can be quite beneficial to draw a clear line between the two, and to have each of them in their own package. An immediate benefit of this approach is that it tends to alleviate the dependency burden. For example this would reduce the number of Rarr's deps (direct and indirect) from 46 to 33. Another benefit is that this kind of separation tends to facilitate maintenance in the long run.
Mike, if you're ok with this, I would start the ZarrArray package (or should we call it ZArray?) and add the ZarrArray stuff that you have in Rarr to this new package. (BTW should we also rename this class ZArray?) You'd remain first author of the new package and I could be its maintainer if you want, whatever you prefer. I would also take care of submitting the new package to Bioconductor and removing the ZarrArray stuff from Rarr (via a PR) after acceptance of the new package.
How does that sound?
I'm also including Artür to this conversation. @Artur-man: Based on the issues and PRs you've submitted to this repo, my impression is that you have a particular interest in Rarr and ZarrArray objects :wink:
Best, H.
Thanks @hpages, whatever @grimbough and you prefer I am cool with it. By the way, I once started a ZarrArray package here but after Rarr didnt touch it much.
One minor point, please ZarrArray not ZArray or the like. We already have too many classes with cryptic names that create a steep learning curve for anyone besides the developers...
On 28. May 2025, at 01:17, Hervé Pagès @.***> wrote:
hpages created an issue (grimbough/Rarr#22) https://github.com/grimbough/Rarr/issues/22 Hi @grimbough https://github.com/grimbough @Artur-man https://github.com/Artur-man,
How do you guys feel about moving the ZarrArray class and related stuff to their own package? IMO implementing a DelayedArray extension for a given backend is kind of a separate business from implementing the lower-level IO functionality. In my experience, it can be quite beneficial to draw a clear line between the two, and to have each of them in their own package. An immediate benefit of this approach is that it tends to alleviate the dependency burden. For example this would reduce the number of Rarr's deps (direct and indirect) from 46 to 33. Another benefit is that this kind of separation tends to facilitate maintenance in the long run.
Mike, if you're ok with this, I would start the ZarrArray package (or should we call it ZArray?) and add the ZarrArray stuff that you have in Rarr to this new package. (BTW should we also rename this class ZArray?) You'd remain first author of the new package and I could be its maintainer if you want, whatever you prefer. I would also take care of submitting the new package to Bioconductor and removing the ZarrArray stuff from Rarr (via a PR) after acceptance of the new package.
How does that sound?
I'm also including Artür to this conversation. Based on the issues and PRs he's submitted to this repo, my impression is that he has a particular interest in Rarr and ZarrArray objects 😉
Best, H.
Ok let's keep the pirate-sounding name 😃 Thanks @wolfganghuber for your feedback!
Here is an attempt in https://github.com/BIMSBbioinfo/ZarrArray. I used an example column-based sparse representation in an anndata zarr, compatible with the format in HDF5. Would be nice to bring utilities to HDF5Array to zarr.
> zarr_dir <- system.file("extdata", "example2.zarr.zip", package = "ZarrArray")
> td <- tempdir(check = TRUE)
> unzip(zarr_dir, exdir = td)
> store <- file.path(td, "example2.zarr")
> name <- "layers/csc_counts"
> list.files(file.path(store, name))
[1] "data" "indices" "indptr"
> ZarrSparseMatrix(store, name)
<100 x 50> sparse ZarrSparseMatrix object of type "double":
[,1] [,2] [,3] ... [,49] [,50]
[1,] 3.000000e+00 0.000000e+00 0.000000e+00 . 0 0
[2,] 2.805883e-314 6.388312e-314 0.000000e+00 . 0 0
[3,] 5.928788e-323 2.805879e-314 0.000000e+00 . 0 0
[4,] 6.389565e-314 2.133263e-314 0.000000e+00 . 0 0
[5,] 9.881313e-324 2.805879e-314 0.000000e+00 . 0 0
... . . . . . .
[96,] 2.133263e-314 2.133263e-314 0.000000e+00 . 0 0
[97,] 6.389565e-314 0.000000e+00 0.000000e+00 . 0 0
[98,] 0.000000e+00 0.000000e+00 0.000000e+00 . 0 0
[99,] 3.589157e-314 2.133263e-314 0.000000e+00 . 0 0
[100,] 0.000000e+00 0.000000e+00 0.000000e+00 . 0 0