spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

Local cloud

Open sophiamaedler opened this issue 1 month ago • 2 comments

I was trying to initialize a spatialdata object directly from S3 as done in the tests here:

from upath import UPath
import spatialdata as sd
test = UPath( "s3://spatialdata/spatialdata-sandbox/merfish.zarr", endpoint_url="https://s3.embl.de", anon=True )
sd.read_zarr(test)

Was failing with:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 4
      2 import spatialdata as sd
      3 test = UPath( "s3://spatialdata/spatialdata-sandbox/merfish.zarr", endpoint_url="https://s3.embl.de/", anon=True )
----> 4 sd.read_zarr(test)

File ~/src/spatialdata/_io/io_zarr.py:282, in read_zarr(store, selection, on_bad_files)
    272     attrs = None
    274 sdata = SpatialData(
    275     images=images,
    276     labels=labels,
   (...)    
    281 )
--> 282 sdata.path = _create_upath(_store)
    283 return sdata

File ~/src/spatialdata/_core/spatialdata.py:590, in SpatialData.path(self, value)
    588     self._path = value
    589 else:
--> 590     raise TypeError("Path must be `None`, a `str` or a `Path` object.")
    592 if not self.is_self_contained():
    593     logger.info(
    594         "The SpatialData object is not self-contained "
    595         "(i.e. it contains some elements that are Dask-backed "
    596         "from locations outside {self.path})."
    597     )

TypeError: Path must be `None`, a `str` or a `Path` object.

The implemented changes fix the issues and result in the sdata object being successfully read from S3.

The code now returns:

SpatialData object, with associated Zarr store: s3://spatialdata/spatialdata-sandbox/merfish.zarr
├── Images
│     └── 'rasterized': DataArray[cyx] (1, 522, 575)
├── Points
│     └── 'single_molecule': DataFrame with shape: (<Delayed>, 3) (2D points)
├── Shapes
│     ├── 'anatomical': GeoDataFrame shape: (6, 1) (2D shapes)
│     └── 'cells': GeoDataFrame shape: (2389, 2) (2D shapes)
└── Tables
      └── 'table': AnnData (2389, 268)
with coordinate systems:
    ▸ 'global', with elements:
        rasterized (Images), single_molecule (Points), anatomical (Shapes), cells (Shapes)
with the following Dask-backed elements not being self-contained:
    ▸ rasterized: [path/spatialdata/spatialdata-sandbox/merfish.zarr/images/rasterized]
    ▸ single_molecule: [path/spatialdata/spatialdata-sandbox/merfish.zarr/points/single_molecule/points.parquet/part.0.parquet]

sophiamaedler avatar Oct 22 '25 16:10 sophiamaedler

actions failing due to changes introduced in previous commit: c514a0b I can take a look to see if I can figure out the problem.

sophiamaedler avatar Oct 23 '25 10:10 sophiamaedler

There is PR #971 that fixes the remote storage completely; however, between that PR and zarrv3 being merged, zarrv3 got merged first. So #971 would require an update.

melonora avatar Oct 27 '25 15:10 melonora

with the current dask unpinning this PR would not work anymore, neither does the other PR. I am implementing some fixes at the moment. Main problem is that now FSSspectstore requires an async file system.

melonora avatar Nov 05 '25 09:11 melonora