anndataR icon indicating copy to clipboard operation
anndataR copied to clipboard

Zarr support

Open keller-mark opened this issue 1 year ago • 14 comments

Fixes #91

These changes are from both me and @Artur-man

The main public-facing changes here are:

  • The ZarrAnnData class
  • read_zarr and write_zarr top-level functions
  • Support for from_Seurat(output_class="ZarrAnnData")
  • Support for from_SingleCellExperiment(output_class="ZarrAnnData")

Internally:

  • read_zarr_helpers.R is the zarr analog of read_h5ad_helpers.R
  • write_zarr_helpers.R is the zarr analog of write_h5ad_helpers.R
  • Test fixtures within inst/extdata/example.zarr (this makes the diff noisy, apologies)
  • Lots of tests:
    • test-Zarr-read.R (35 new tests)
    • test-Zarr-write.R (70)
    • test-ZarrAnnData.R (26)
    • test-h5ad-zarr.R (17)

A number of these functions generate warnings in the R console that are intended to be followed up on to improve the code (and should probably be resolved as end users may not appreciate them), but the tests still pass despite these warnings.

Known things that are not implemented here:

  • support for recarrays
  • usage of mode = c("r", "r+", "a", "w", "w-", "x") parameter value

keller-mark avatar Nov 05 '24 16:11 keller-mark

@rcannood @keller-mark should we add pizzarr only under Suggests in DESCRIPTION and no need to add a Remotes list here ?

Suggests:
    ...
    ...
    pizzarr
Remotes:
    keller-mark/pizzarr

Artur-man avatar Nov 06 '24 15:11 Artur-man

Good point, at minimum we should add a note to the Github README here https://github.com/scverse/anndataR/tree/main?tab=readme-ov-file#installation

keller-mark avatar Nov 06 '24 20:11 keller-mark

Ah, this is already added here, nevermind then only Suggests should work. https://github.com/keller-mark/anndataR/blob/d192e68ed312f212476a5490c973619f08e7c8de/R/ZarrAnnData.R#L240-L246.

but will add this to README as well!!

Artur-man avatar Nov 06 '24 22:11 Artur-man

Not a proper review of the code but I wanted to raise that we plan to submit {anndataR} to Bioconductor (or maybe CRAN) at some point (hopefully soonish). This means we won't be able to depend on GitHub only packages. Do you have plans to submit {pizzarr} somewhere or could you rewrite this to use one of the public Zarr packages?

lazappi avatar Nov 07 '24 06:11 lazappi

@lazappi thanks for the notification, pizzarr could be submitted to CRAN soon. https://github.com/keller-mark/pizzarr/pull/93

Artur-man avatar Nov 07 '24 08:11 Artur-man

@keller-mark any opinions on this ? perhaps we can make Rarr/pizzarr optional with options() depending on the speed.

Regardless it would be good to have zarr support for AnnDataR. We are currently using reticulate/basilisk env in SpatialData and causes problems already (too many dependancies, multiple env etc.). https://github.com/HelenaLC/SpatialData

That being said, any dependancy should either be in CRAN or BioC. If there are missing utilities in Rarr we can also contact the maintainer and/or send PRs there (I did some, which I can continue too).

Artur-man avatar Mar 10 '25 13:03 Artur-man

I have quickly checked if pizzarr utilities could be replaced with https://github.com/grimbough/Rarr, unfortunately there exists a set of limitations to the BioC native package, which

  • [ ] (in progress) opening zarr stores and creating groups, see https://github.com/Huber-group-EMBL/Rarr/pull/18
  • [x] read/write boolean arrays, see https://github.com/Huber-group-EMBL/Rarr/pull/59
  • [x] fixing errors when reading from some character arrays, see https://github.com/grimbough/Rarr/issues/20

Will be in touch to see if these are resolved in the future, otherwise no zarr R package is currently both in CRAN/BioC and functionally complete yet.

Artur-man avatar Apr 12 '25 20:04 Artur-man

There is some progress in the Rarr package, would you guys like a clean PR (since there were so many updates since) or continuing here is fine ?

Artur-man avatar Nov 10 '25 12:11 Artur-man

There is some progress in the Rarr package, would you guys like a clean PR (since there were so many updates since) or continuing here is fine ?

Probably whatever is easiest for you and @keller-mark/whatever makes the PR easiest to understand. There have been a lot of changes to the package since this was opened so we would need to make sure those are included here.

I saw that {Rarr} is planning to have Zarr v3 support for the next release so I think that makes sense in terms of which backend package to use.

lazappi avatar Nov 11 '25 06:11 lazappi