anndataR
anndataR copied to clipboard
Zarr support
Fixes #91
These changes are from both me and @Artur-man
The main public-facing changes here are:
- The
ZarrAnnDataclass read_zarrandwrite_zarrtop-level functions- Support for
from_Seurat(output_class="ZarrAnnData") - Support for
from_SingleCellExperiment(output_class="ZarrAnnData")
Internally:
read_zarr_helpers.Ris the zarr analog ofread_h5ad_helpers.Rwrite_zarr_helpers.Ris the zarr analog ofwrite_h5ad_helpers.R- Test fixtures within
inst/extdata/example.zarr(this makes the diff noisy, apologies) - Lots of tests:
test-Zarr-read.R(35 new tests)test-Zarr-write.R(70)test-ZarrAnnData.R(26)test-h5ad-zarr.R(17)
A number of these functions generate warnings in the R console that are intended to be followed up on to improve the code (and should probably be resolved as end users may not appreciate them), but the tests still pass despite these warnings.
Known things that are not implemented here:
- support for
recarrays - usage of
mode = c("r", "r+", "a", "w", "w-", "x")parameter value
@rcannood @keller-mark should we add pizzarr only under Suggests in DESCRIPTION and no need to add a Remotes list here ?
Suggests:
...
...
pizzarr
Remotes:
keller-mark/pizzarr
Good point, at minimum we should add a note to the Github README here https://github.com/scverse/anndataR/tree/main?tab=readme-ov-file#installation
Ah, this is already added here, nevermind then only Suggests should work.
https://github.com/keller-mark/anndataR/blob/d192e68ed312f212476a5490c973619f08e7c8de/R/ZarrAnnData.R#L240-L246.
but will add this to README as well!!
Not a proper review of the code but I wanted to raise that we plan to submit {anndataR} to Bioconductor (or maybe CRAN) at some point (hopefully soonish). This means we won't be able to depend on GitHub only packages. Do you have plans to submit {pizzarr} somewhere or could you rewrite this to use one of the public Zarr packages?
@lazappi thanks for the notification, pizzarr could be submitted to CRAN soon. https://github.com/keller-mark/pizzarr/pull/93
@keller-mark any opinions on this ? perhaps we can make Rarr/pizzarr optional with options() depending on the speed.
Regardless it would be good to have zarr support for AnnDataR. We are currently using reticulate/basilisk env in SpatialData and causes problems already (too many dependancies, multiple env etc.). https://github.com/HelenaLC/SpatialData
That being said, any dependancy should either be in CRAN or BioC. If there are missing utilities in Rarr we can also contact the maintainer and/or send PRs there (I did some, which I can continue too).
I have quickly checked if pizzarr utilities could be replaced with https://github.com/grimbough/Rarr, unfortunately there exists a set of limitations to the BioC native package, which
- [ ] (in progress) opening zarr stores and creating groups, see https://github.com/Huber-group-EMBL/Rarr/pull/18
- [x] read/write boolean arrays, see https://github.com/Huber-group-EMBL/Rarr/pull/59
- [x] fixing errors when reading from some character arrays, see https://github.com/grimbough/Rarr/issues/20
Will be in touch to see if these are resolved in the future, otherwise no zarr R package is currently both in CRAN/BioC and functionally complete yet.
There is some progress in the Rarr package, would you guys like a clean PR (since there were so many updates since) or continuing here is fine ?
There is some progress in the Rarr package, would you guys like a clean PR (since there were so many updates since) or continuing here is fine ?
Probably whatever is easiest for you and @keller-mark/whatever makes the PR easiest to understand. There have been a lot of changes to the package since this was opened so we would need to make sure those are included here.
I saw that {Rarr} is planning to have Zarr v3 support for the next release so I think that makes sense in terms of which backend package to use.