Support for private remote object storage
Right now, there is support for local .zarr stores and remote stores publically accessible via HTTP or S3. Private remote stores are more difficult, as they need certain options or credentials that are not representable by simply a string or Path. One option is to use a zarr.storage.FSStore, which can have storage_options or any fsspec.spec.AbstractFileSystem.
Two pull requests enable this:
- Support init of ome_zarr_py.io.ZarrLocation with zarr.storage.FSStore (#349)
- Support remote private storage by consistent use of substore (#442)
Testing is difficult, but this is what I used:
import spatialdata as sd
import zarr
# works now, requires credentials in ~/.aws/credentials
root = zarr.open('s3://BUCKET/spatial-sandbox/visium_associated_xenium_io.zarr', storage_options = {'client_kwargs': {'endpoint_url': MINIO_URL}})
sd.read_zarr(root)
# still works, I think depends on zmetadata?
sd.read_zarr('https://s3.embl.de/spatialdata/spatialdata-sandbox/visium_associated_xenium_io.zarr/')
# still works
sd.read_zarr('~/visium_associated_xenium_io.zarr')
I refactored to use UPath, which solves many issues I had with remote support. So I would recommend UPath over Path, str, ZarrLocation...
It works with my own object storage:
from upath import UPath
from spatialdata import SpatialData
p = UPath(
"s3://BUCKET/spatial-sandbox/visium_associated_xenium_io_tables.zarr",
endpoint_url="https://objectstor.vib.be",
)
full_sdata.write(p)
sdata = SpatialData.read(p)
I also added tests for the remote datasets and mocked remote tests. There are still some remaining issues:
- [x] reading from private remote storage over S3 works
- [x] writing to private remote storage over S3 works
- [ ]
test_remote_mock.pymock reading test using ome-zarr fails, so images and labels fail. I need to test this some more as I'm also using a patched ome_zarr. - [ ]
test_remote.pyreading the SpatialData remote datasets over HTTP fails for the points parquet files. I also can't reproduce the working implementation (maybe because of a package update?).
I will likely be a while until I can work on this some more.
@LucaMarconato @ArneDefauw