spatialdata MultiScaleImage datatree.DataTree attrs in sdata object lost when writing to disk

Hi I populated a SpatialData object with a MultiScaleImage datatree.DataTree.

img_scaled = to_multiscale(img, scale_factors=scale_factors)
img_to_save = {stitched_marker_name: img_scaled}
sdata = SpatialData(images=img_to_save)

The .attrs stored in each of the image level of the MultiScaleImage are preserved in the sdata object

DataTree('None', parent=None)
├── DataTree('scale0')
│       Dimensions:  (z: 4, y: 46767, x: 46768, c: 1)
│       Coordinates:
│         * z        (z) float64 0.0 1.0 2.0 3.0
│         * y        (y) float64 0.0 1.0 2.0 3.0 ... 4.676e+04 4.676e+04 4.677e+04
│         * x        (x) float64 0.0 1.0 2.0 3.0 ... 4.676e+04 4.677e+04 4.677e+04
│         * c        (c) int64 0
│       Data variables:
│           image    (c, z, y, x) uint16 dask.array<chunksize=(1, 4, 64, 64), meta=np.ndarray>
│       Attributes:
│           transform:         {'global': Scale (c, z, y, x)\n    [1. 1. 1. 1.]}
│           downsample_level:  1

When I write it to disc sdata.write() and read back the file sdata = SpatialData.read(fpath) all the .attrs are missing. I have been looking on the API instruction for a possible solution but I could find anything. Can you point me to a potential solution? Thanks for the help and sorry if I missed something.

Thanks a lot! Simone

Jun 11 '24 11:06 simone-codeluppi

Thanks for reporting @simone-codeluppi. I am not entirely certain immediately what goes wrong here, but I think you have some attributes that we do not support in the object. We comply with OME-NGFF and thus parse the image prior to putting in the SpatialData object. For you this would require the Image3DModel: Image3DModel.parse

image_model = Image3DModel.parse(img, scale_factors=[2,2,2,2], dims=("z", "c", "y", "x"), ....)

This ensures a valid object is passed to SpatialData. Could you please report back if this does not fix your issue?

Jun 18 '24 09:06 melonora

see also https://spatialdata.scverse.org/en/latest/tutorials/notebooks/notebooks/examples/models2.html

Jun 18 '24 09:06 melonora

Note that right now we do not support parsing a DataTree, you can specify scale_factors though to create it. I am working on supporting this.

Jun 18 '24 09:06 melonora

Hi thanks for the reply. You are correct. I have some .attrs that are not standard (I removed them :)). When I create the sdata object with img_scaled = Image2DModel.parse( img_da, dims=("c", "y", "x"), scale_factors=scale_factors, chunks=CHUNK_SIZE_2D, ) the attrs are present (like in the notebook you linked) but are not saved. So like you suggested it may have to do with datatree.MultiscaleSpatialImage. Thanks for the help. I will keep an eye on the next versions of spatialdata.

Jun 18 '24 13:06 simone-codeluppi

Thanks for the info, if you want I can still have a look at this. In such a case please attach some short snipped that I can reproduce. Thanks 😊

Jun 18 '24 14:06 LucaMarconato

Thanks a lot! Here is a snippet:

# Generate image
import numpy as np
import dask.array as da
from pathlib import Path
from spatialdata.models import Image2DModel
from spatialdata import SpatialData
from spatialdata.transformations import Identity, Scale, Sequence

img = da.random.random((5000, 5000), chunks=(1000, 1000))
img = da.expand_dims(img, axis=0)

If I apply a version of the parsing similar to the one in the example notebook without creating a MultiScaleImage the output has the expected attributes

img_scaled = Image2DModel.parse(
    img,
    transformations={"global": Scale([1, 4,4], axes=("c","y", "x"))},
)

However, when I create an image pyramid the attrs are lost

img_scaled= Image2DModel.parse(img, dims=("c","y","x"), chunks=(1, 1000, 1000), scale_factors=[4, 4])

Sorry if I missed something and done something wrong! Thanks for the help!

Jun 18 '24 15:06 simone-codeluppi

hi @simone-codeluppi , thanks for reporting this, so the transformation is stored in attrs of each scale, e.g. in the example above

img_scaled2= Image2DModel.parse(
    img, 
    dims=("c","y","x"), 
    chunks=(1, 1000, 1000),
    scale_factors=[4,4],
    transformations={"global": Scale([1, 4,4], axes=("c","y", "x"))},
)
img_scaled2["scale0"]["image"].attrs
>>> {'transform': {'global': Scale (c, y, x)
>>>      [1. 4. 4.]}}

so the global attrs is not used (but in SpatialImage aka single scale, it is). I understand this is confusing, and while during spec discussions we might have touched upon it, we might reconsider now. So the question is: should the DataTree top level attrs contain transformations? and if yes, of which level?

remember that e.g. now, the scaling is added on top of the user-defined transformation, so for the same example above:

img_scaled2["scale1"]["image"].attrs
>>> {'transform': {'global': Sequence 
>>>       Scale (y, x)
>>>           [4. 4.]
>>>       Scale (c, y, x)
>>>           [1. 4. 4.]}}

Jun 19 '24 15:06 giovp

Thanks for the additional details, as @giovp mentioned, we don't use the .attrs at the MultiscaleSpatialImage (now simply DataTree) level; instead the transformations are added in each scale. This allows to select arbitrary scales from DataTree object and treat them as single scales valid DataArray objects without breaking the data alignment.

From a user perspective, when calling set_transformation() on a DataTree object, all the scales are automatically adjusted (here the link to some internal code called by set_transformation()): https://github.com/scverse/spatialdata/blob/a7dfc3cb4ed2287fcb91b01a34d29109915272de/src/spatialdata/transformations/_utils.py#L106 On the other hand, when calling get_transformation(), the framework checks that no transformation is present in the .attrs of the DataTree object, and proceeds returning the transformation in the outer scale: https://github.com/scverse/spatialdata/blob/a7dfc3cb4ed2287fcb91b01a34d29109915272de/src/spatialdata/transformations/_utils.py#L83

Jun 25 '24 12:06 LucaMarconato

hi @simone-codeluppi will close this but feel free to reopen if necessary

Sep 08 '24 00:09 giovp