spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

Some suggestions and proposals for annotations in SpatialData

Open selmanozleyen opened this issue 4 months ago • 2 comments

Hi,

I'd like to first start this conversation then create more specific issues in points you agree with me. I have some suggestions about modifying and generalizing the internals of SpatialData annotations.

One row can only link to one spatial element

Image

Currently, a row in a table can at most only annotate one type of a spatial element. E.g. If sdata['table'][i] annotates sdata['shape'][i], then sdata['table'][i] can't annotate sdata['label'][i].

Take this test code I wrote for example #946

sdata = concatenate(
    {
        "labels": blobs_annotating_element("blobs_labels"),
        "shapes": blobs_annotating_element("blobs_circles"),
        "points": blobs_annotating_element("blobs_points"),
        "multiscale_labels": blobs_annotating_element("blobs_multiscale_labels"),
    },
    concatenate_tables=True,
)
third_elems = sdata.tables["table"].obs["instance_id"] == 3
subset_sdata = subset_sdata_by_table_mask(sdata, "table", third_elems)
# here elements with instance_id 3 are more than one in the table
# just to be able to annotate a cell in another region I had to duplicate the count information etc

My conclusion

Because we store each row-to-row mapping in the table itself we end up having to duplicate count information because we "explode" the table.

One row can only link to one item of a spatial element

One-to-many relationship is something we'd like to actually have for points I think. We already have this implicitly for the labels. And we can support this by just generalizing the current annotation scheme.

My suggestion to solve both issues

Ultimately we want a mapping {src_key: {dst_element_name: (dst_access, dst_kind, link_kind, dst_instance_key)}}.

  • dst_access is the access method of the dst element, for example "value" or "key". Currently for labels we use "value" since there is no columns in a raster image and for shapes and points we use "key" since we have a column in the table
  • dst_kind is the kind of the dst element, for example "labels", "shapes", "points".
  • link_kind is the kind of the link, for example "one-to-one", "one-to-many".
  • dst_instance_key is the key of the dst element if dst_access is "key".

Currently dst_kind serves no purpose as we define the kind of linking we want but I added it for future flexibility.

User interface might look like this.

mapping = {
    "instance_id": {
        "blobs_labels": ("value", "label", "one-to-one", None), 
        "blobs_circles": ("key",   "shape", "one-to-one", ("shape_id",)),
        "parts_of_a_cell": ("key",   "shape", "one-to-many", ("shape_id",)),
        "blobs_points": ("key",   "point", "one-to-many", ("contained_in_shape_id",)),
    },
}
add_links(sdata, "table", mapping)

Stored in exploded normalized form for example sdata.tables["table"].uns["row_mappings"]

| src_instance_key | dst_elem_name | dst_instance_key | dst_access | dst_kind | link_kind |
| "instance_id" | "blobs_labels" | ... | "value" | "label" | "one-to-one" |
| "instance_id" | "blobs_circles" | ... | "key" | "shape" | "one-to-one" |
| "instance_id" | "parts_of_a_cell" | ... | "key" | "shape" | "one-to-many" |
| "instance_id" | "blobs_points" | ... | "key" | "point" | "one-to-many" |

I think we can manage these changes in a backwards compatible way and this will open up a lot of possibilities for future extensions.

Bonus points: we would have easier time achieving this https://github.com/scverse/spatialdata/issues/293#issuecomment-1657290681 as well since the mapping descriptions is much smaller than adding a column to the .obs

selmanozleyen avatar Aug 24 '25 15:08 selmanozleyen

@selmanozleyen Thanks for your suggestion! I am currently completing my thesis, but will have some deeper thinking of this. Just quick thinking, would this behaviour not also be enabled by allowing multiple instance id columns where an instance id can be duplicated? This would affect quite a bit of code though, e.g. the whole matching tables to elements machinery. Thought this would be so in any case I think

melonora avatar Sep 21 '25 11:09 melonora

ld this behaviour not also be enabled by allowing multiple instance id columns yes but from the spatialdata object we wouldn't know which columns can be instance_id's

from talking with @LucaMarconato I will add these metadata somewhere in anndata for just my functions like sq.pp.filter_cells and its helpers in spatialdata. then we will see if how useful it is or how hard it would be to adopt etc. In my next PRs I will mention this issue

selmanozleyen avatar Sep 26 '25 13:09 selmanozleyen