spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

Adding `attrs` at the `SpatialData` object level

Open quentinblampey opened this issue 1 year ago • 2 comments

As described in issue #404, it is useful to have attributes at the SpatialData object level. This will also be useful in Sopa, to simplify the API.

Usage:

import spatialdata

from spatialdata.datasets import blobs

sdata = blobs()

sdata.attrs # empty dictionnary
sdata.attrs["test"] = 12

sdata.write("blobs.zarr") # this saves the attributes on disk

sdata = spatialdata.read_zarr("blobs.zarr") # the attributes are read normally

sdata.attrs["test"] # returns 12

What do you think @LucaMarconato?

quentinblampey avatar Sep 23 '24 09:09 quentinblampey

Codecov Report

Attention: Patch coverage is 88.60759% with 9 lines in your changes missing coverage. Please review.

Project coverage is 91.78%. Comparing base (2a0c23a) to head (4546bea). Report is 70 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata/_core/spatialdata.py 92.00% 4 Missing :warning:
src/spatialdata/_io/format.py 71.42% 4 Missing :warning:
src/spatialdata/_io/io_zarr.py 75.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #711      +/-   ##
==========================================
- Coverage   91.80%   91.78%   -0.03%     
==========================================
  Files          45       45              
  Lines        6959     7045      +86     
==========================================
+ Hits         6389     6466      +77     
- Misses        570      579       +9     
Files with missing lines Coverage Δ
src/spatialdata/_core/_deepcopy.py 98.38% <100.00%> (+0.02%) :arrow_up:
src/spatialdata/_core/concatenate.py 91.80% <100.00%> (+0.20%) :arrow_up:
src/spatialdata/_core/operations/_utils.py 92.15% <100.00%> (ø)
src/spatialdata/_core/operations/rasterize.py 90.45% <100.00%> (ø)
src/spatialdata/_core/operations/transform.py 90.82% <100.00%> (ø)
src/spatialdata/_core/query/spatial_query.py 95.50% <100.00%> (ø)
src/spatialdata/_io/io_zarr.py 87.77% <75.00%> (-0.60%) :arrow_down:
src/spatialdata/_core/spatialdata.py 90.87% <92.00%> (-0.01%) :arrow_down:
src/spatialdata/_io/format.py 83.22% <71.42%> (-1.13%) :arrow_down:

... and 3 files with indirect coverage changes

codecov[bot] avatar Sep 23 '24 09:09 codecov[bot]

Thanks for the PR.

  • [x] before merging we need to update the format version; we probably should add a format for the SpatialData container; something asked also by several users (we already have a format for the single elements, which would be independent from the format of the container).

LucaMarconato avatar Sep 26 '24 16:09 LucaMarconato

Hi @LucaMarconato, have you started working on your last comment? If not, do you want me to have a look (although I'm not exactly sure to understand what you want to do)?

quentinblampey avatar Oct 14 '24 11:10 quentinblampey

I noticed we are missing a feature: after we save a SpatialData object, if we add a new attr, there is currently no way to save it on disk. Something like sdata.write_element for the attrs

quentinblampey avatar Oct 29 '24 09:10 quentinblampey

Just a random note on this: I agree that having SpatialData - level attributes would be super helpful. Beyond that, I think it would be great to be able to annotate coordinate systems - these often correspond to samples, conditions, time-points etc. I can of course annotate all elements within a coordinate system through tables, but I think it would be more direct to somehow directly provide metadata for an entire coordinate system.

Marius1311 avatar Oct 30 '24 17:10 Marius1311

I also noticed that we want to pass the attrs inside certain functions, e.g. when performing a query (the attrs should be kept). I made a commit for this

The only concern is when we concatenate multiple SpatialData objects, how should we merge the duplicate keys (if any)?

NB: I updated the first post with a check-list

quentinblampey avatar Oct 31 '24 16:10 quentinblampey

@quentinblampey thanks for the update. I haven't updated the format yet, we can work on it this week at the hackathon. Regarding saving to disk, I suggest to extend the write_metadata() method. Finally, regarding .attrs, we could proceed as AnnData does for .uns when calling ad.concat().

LucaMarconato avatar Nov 11 '24 14:11 LucaMarconato

Extremely interested in this, thanks @quentinblampey !!

timtreis avatar Nov 18 '24 17:11 timtreis

@quentinblampey I have finalized this PR, I'm going to merge it now. Thanks for your contribution!

Two comments.

  1. root-level versioning: On disk, I am now adding a version number for the root level metadata, and also I am adding the string of the spatialdata software version that was used to create the metadata (this helps reproducibility). When reading the .zattrs from disk into the SpatialData.attrs slow I am not parsing the two version strings not to clutter the object.
  2. .attrs passed by reference: I have removed the code dict(sdata.attrs) and passing the attrs always by reference. The only exceptions are when I call spatialdata.deepcopy(), where I call copy.deepcopy(), and when the user passes a Mapping that is not a dict. In such case the casting to dict performs a shallow copy.

LucaMarconato avatar Dec 01 '24 13:12 LucaMarconato