spatialdata
spatialdata copied to clipboard
Local cloud
I was trying to initialize a spatialdata object directly from S3 as done in the tests here:
from upath import UPath
import spatialdata as sd
test = UPath( "s3://spatialdata/spatialdata-sandbox/merfish.zarr", endpoint_url="https://s3.embl.de", anon=True )
sd.read_zarr(test)
Was failing with:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[1], line 4
2 import spatialdata as sd
3 test = UPath( "s3://spatialdata/spatialdata-sandbox/merfish.zarr", endpoint_url="https://s3.embl.de/", anon=True )
----> 4 sd.read_zarr(test)
File ~/src/spatialdata/_io/io_zarr.py:282, in read_zarr(store, selection, on_bad_files)
272 attrs = None
274 sdata = SpatialData(
275 images=images,
276 labels=labels,
(...)
281 )
--> 282 sdata.path = _create_upath(_store)
283 return sdata
File ~/src/spatialdata/_core/spatialdata.py:590, in SpatialData.path(self, value)
588 self._path = value
589 else:
--> 590 raise TypeError("Path must be `None`, a `str` or a `Path` object.")
592 if not self.is_self_contained():
593 logger.info(
594 "The SpatialData object is not self-contained "
595 "(i.e. it contains some elements that are Dask-backed "
596 "from locations outside {self.path})."
597 )
TypeError: Path must be `None`, a `str` or a `Path` object.
The implemented changes fix the issues and result in the sdata object being successfully read from S3.
The code now returns:
SpatialData object, with associated Zarr store: s3://spatialdata/spatialdata-sandbox/merfish.zarr
├── Images
│ └── 'rasterized': DataArray[cyx] (1, 522, 575)
├── Points
│ └── 'single_molecule': DataFrame with shape: (<Delayed>, 3) (2D points)
├── Shapes
│ ├── 'anatomical': GeoDataFrame shape: (6, 1) (2D shapes)
│ └── 'cells': GeoDataFrame shape: (2389, 2) (2D shapes)
└── Tables
└── 'table': AnnData (2389, 268)
with coordinate systems:
▸ 'global', with elements:
rasterized (Images), single_molecule (Points), anatomical (Shapes), cells (Shapes)
with the following Dask-backed elements not being self-contained:
▸ rasterized: [path/spatialdata/spatialdata-sandbox/merfish.zarr/images/rasterized]
▸ single_molecule: [path/spatialdata/spatialdata-sandbox/merfish.zarr/points/single_molecule/points.parquet/part.0.parquet]
actions failing due to changes introduced in previous commit: c514a0b I can take a look to see if I can figure out the problem.
There is PR #971 that fixes the remote storage completely; however, between that PR and zarrv3 being merged, zarrv3 got merged first. So #971 would require an update.
with the current dask unpinning this PR would not work anymore, neither does the other PR. I am implementing some fixes at the moment. Main problem is that now FSSspectstore requires an async file system.