MultiScaleImage datatree.DataTree attrs in sdata object lost when writing to disk
Hi
I populated a SpatialData object with a MultiScaleImage datatree.DataTree.
img_scaled = to_multiscale(img, scale_factors=scale_factors)
img_to_save = {stitched_marker_name: img_scaled}
sdata = SpatialData(images=img_to_save)
The .attrs stored in each of the image level of the MultiScaleImage are preserved in the sdata object
DataTree('None', parent=None)
├── DataTree('scale0')
│ Dimensions: (z: 4, y: 46767, x: 46768, c: 1)
│ Coordinates:
│ * z (z) float64 0.0 1.0 2.0 3.0
│ * y (y) float64 0.0 1.0 2.0 3.0 ... 4.676e+04 4.676e+04 4.677e+04
│ * x (x) float64 0.0 1.0 2.0 3.0 ... 4.676e+04 4.677e+04 4.677e+04
│ * c (c) int64 0
│ Data variables:
│ image (c, z, y, x) uint16 dask.array<chunksize=(1, 4, 64, 64), meta=np.ndarray>
│ Attributes:
│ transform: {'global': Scale (c, z, y, x)\n [1. 1. 1. 1.]}
│ downsample_level: 1
When I write it to disc sdata.write() and read back the file sdata = SpatialData.read(fpath) all the .attrs are missing.
I have been looking on the API instruction for a possible solution but I could find anything. Can you point me to a potential solution? Thanks for the help and sorry if I missed something.
Thanks a lot! Simone
Thanks for reporting @simone-codeluppi. I am not entirely certain immediately what goes wrong here, but I think you have some attributes that we do not support in the object. We comply with OME-NGFF and thus parse the image prior to putting in the SpatialData object. For you this would require the Image3DModel: Image3DModel.parse
image_model = Image3DModel.parse(img, scale_factors=[2,2,2,2], dims=("z", "c", "y", "x"), ....)
This ensures a valid object is passed to SpatialData. Could you please report back if this does not fix your issue?
see also https://spatialdata.scverse.org/en/latest/tutorials/notebooks/notebooks/examples/models2.html
Note that right now we do not support parsing a DataTree, you can specify scale_factors though to create it. I am working on supporting this.
Hi
thanks for the reply. You are correct. I have some .attrs that are not standard (I removed them :)).
When I create the sdata object with
img_scaled = Image2DModel.parse( img_da, dims=("c", "y", "x"), scale_factors=scale_factors, chunks=CHUNK_SIZE_2D, )
the attrs are present (like in the notebook you linked) but are not saved. So like you suggested it may have to do with datatree.MultiscaleSpatialImage.
Thanks for the help. I will keep an eye on the next versions of spatialdata.
Thanks for the info, if you want I can still have a look at this. In such a case please attach some short snipped that I can reproduce. Thanks 😊
Thanks a lot! Here is a snippet:
# Generate image
import numpy as np
import dask.array as da
from pathlib import Path
from spatialdata.models import Image2DModel
from spatialdata import SpatialData
from spatialdata.transformations import Identity, Scale, Sequence
img = da.random.random((5000, 5000), chunks=(1000, 1000))
img = da.expand_dims(img, axis=0)
If I apply a version of the parsing similar to the one in the example notebook without creating a MultiScaleImage the output has the expected attributes
img_scaled = Image2DModel.parse(
img,
transformations={"global": Scale([1, 4,4], axes=("c","y", "x"))},
)
However, when I create an image pyramid the attrs are lost
img_scaled= Image2DModel.parse(img, dims=("c","y","x"), chunks=(1, 1000, 1000), scale_factors=[4, 4])
Sorry if I missed something and done something wrong! Thanks for the help!
hi @simone-codeluppi , thanks for reporting this, so the transformation is stored in attrs of each scale, e.g. in the example above
img_scaled2= Image2DModel.parse(
img,
dims=("c","y","x"),
chunks=(1, 1000, 1000),
scale_factors=[4,4],
transformations={"global": Scale([1, 4,4], axes=("c","y", "x"))},
)
img_scaled2["scale0"]["image"].attrs
>>> {'transform': {'global': Scale (c, y, x)
>>> [1. 4. 4.]}}
so the global attrs is not used (but in SpatialImage aka single scale, it is). I understand this is confusing, and while during spec discussions we might have touched upon it, we might reconsider now. So the question is: should the DataTree top level attrs contain transformations? and if yes, of which level?
remember that e.g. now, the scaling is added on top of the user-defined transformation, so for the same example above:
img_scaled2["scale1"]["image"].attrs
>>> {'transform': {'global': Sequence
>>> Scale (y, x)
>>> [4. 4.]
>>> Scale (c, y, x)
>>> [1. 4. 4.]}}
Thanks for the additional details, as @giovp mentioned, we don't use the .attrs at the MultiscaleSpatialImage (now simply DataTree) level; instead the transformations are added in each scale. This allows to select arbitrary scales from DataTree object and treat them as single scales valid DataArray objects without breaking the data alignment.
From a user perspective, when calling set_transformation() on a DataTree object, all the scales are automatically adjusted (here the link to some internal code called by set_transformation()): https://github.com/scverse/spatialdata/blob/a7dfc3cb4ed2287fcb91b01a34d29109915272de/src/spatialdata/transformations/_utils.py#L106
On the other hand, when calling get_transformation(), the framework checks that no transformation is present in the .attrs of the DataTree object, and proceeds returning the transformation in the outer scale: https://github.com/scverse/spatialdata/blob/a7dfc3cb4ed2287fcb91b01a34d29109915272de/src/spatialdata/transformations/_utils.py#L83
hi @simone-codeluppi will close this but feel free to reopen if necessary