The SpatialData object is not self-contained
Hello,
I am contacting you about the “not self-contained” message when saving sdata to a new location. Here is the example:
import spatialdata as sd
from spatialdata.datasets import blobs
sdata = blobs()
sdata.write("/Volumes/One Touch/MICS/data_HE2CellType/CT_DS/test_blobs.zarr")
sdata = sd.read_zarr("/Volumes/One Touch/MICS/data_HE2CellType/CT_DS/test_blobs.zarr")
sdata.table.obs['test'] = 'test'
sdata.table.obs.head()
sdata.write("/Volumes/One Touch/MICS/data_HE2CellType/CT_DS/test_2_blobs.zarr")
And it outputs :
INFO The SpatialData object is not self-contained (i.e. it contains some elements that are Dask-backed from
locations outside [/Volumes/](https://file+.vscode-resource.vscode-cdn.net/Volumes/)One Touch/MICS/data_HE2CellType/CT_DS/test_2_blobs.zarr). Please see the
documentation of `is_self_contained()` to understand the implications of working with SpatialData objects
that are not self-contained.
INFO The Zarr backing store has been changed from [/Volumes/](https://file+.vscode-resource.vscode-cdn.net/Volumes/)One
Touch/MICS/data_HE2CellType/CT_DS/test_blobs.zarr the new file path: [/Volumes/](https://file+.vscode-resource.vscode-cdn.net/Volumes/)One
Touch/MICS/data_HE2CellType/CT_DS/test_2_blobs.zarr
I was wondering if in this case, if I completely delete test_blobs.zarr from my disk, can I lose information in test_2_blobs.zarr or have a problem afterwards? I am having trouble understanding the implications of being “not self-contained”.
Thanks in advance for your help!
Hi, the problem here can be investigated by printing the sdata object after write. Here is an example (see the bottom part):
SpatialData object, with associated Zarr store: /Users/macbook/temp/test_blobs.zarr2
├── Images
│ ├── 'blobs_image': DataArray[cyx] (3, 512, 512)
│ └── 'blobs_multiscale_image': DataTree[cyx] (3, 512, 512), (3, 256, 256), (3, 128, 128)
├── Labels
│ ├── 'blobs_labels': DataArray[yx] (512, 512)
│ └── 'blobs_multiscale_labels': DataTree[yx] (512, 512), (256, 256), (128, 128)
├── Points
│ └── 'blobs_points': DataFrame with shape: (<Delayed>, 4) (2D points)
├── Shapes
│ ├── 'blobs_circles': GeoDataFrame shape: (5, 2) (2D shapes)
│ ├── 'blobs_multipolygons': GeoDataFrame shape: (2, 1) (2D shapes)
│ └── 'blobs_polygons': GeoDataFrame shape: (5, 1) (2D shapes)
└── Tables
└── 'table': AnnData (26, 3)
with coordinate systems:
▸ 'global', with elements:
blobs_image (Images), blobs_multiscale_image (Images), blobs_labels (Labels), blobs_multiscale_labels (Labels), blobs_points (Points), blobs_circles (Shapes), blobs_multipolygons (Shapes), blobs_polygons (Shapes)
with the following Dask-backed elements not being self-contained:
▸ blobs_image: /Users/macbook/temp/test_blobs.zarr/images/blobs_image
▸ blobs_multiscale_image: /Users/macbook/temp/test_blobs.zarr/images/blobs_multiscale_image
▸ blobs_labels: /Users/macbook/temp/test_blobs.zarr/labels/blobs_labels
▸ blobs_multiscale_labels: /Users/macbook/temp/test_blobs.zarr/labels/blobs_multiscale_labels
▸ blobs_points: /Users/macbook/temp/test_blobs.zarr/points/blobs_points/points.parquet/part.0.parquet
Basically, the images, labels and points that have been read still refer to the old Zarr location. To fix, you can simply read again the object from the new disk location.
- [ ] I will try to think of a way to make this info message less obscure, maybe by asking the user to read agian the object if they want to have a self-contained object.
Please let me know if you have additional questions on this!
Hi, that's very clear, thank you very much for your reply!