spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

Method to validate the relationship between elements

Open LucaMarconato opened this issue 2 years ago • 1 comments

I discussed this with @melonora today. Relevant also to @timtreis and @sagar87. CC @giovp

I would add a method to check the consistency of the table and the elements. This function would not throw errors, but check relationships are missing/invalid. This is useful because sometimes we catch bugs only downstream (when trying to plot or aggregate something).

Not sure when we should call this method, maybe after the constructor, after reading and before saving. Or just let the user call it. The name could be validate_data_relationships().

Things that would be checked:

  • [ ] The regions in table.uns['spatialdata_attrs']['region'] are present in the sdata object.
  • [ ] The column with name table.uns['spatialdata_attrs']['region_key'] exists
  • [ ] The values of the rows in the column table.uns['spatialdata_attrs']['region_key'] are exactly the one in table.uns['spatialdata_attrs']['region'].
  • [ ] The column with name table.uns['spatialdata_attrs']['instance_key'] exists
  • [ ] The values of the rows in the column table.uns['spatialdata_attrs']['instance_key'] correspond to the value in the index of the corresponding regions. This check is done in napari_spatialdata when creating a shapes layer and or a labels layer. For instance this warning is given when some of the labels values and the table instance_key values don't match:
2023-04-05 15:12:39.751 | WARNING  | napari_spatialdata.interactive:_find_annotation_for_labels:435 - 11050/11051 labels not annotated: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86}
  • [ ] In a points object, the column with name INSTANCE_KEY exists.
  • [ ] In a points object, the values of the INSTANCE_KEY column actually refer to real regions. I have just realized that there may be a bug around this, I discuss this in this issue: https://github.com/scverse/spatialdata/issues/217

LucaMarconato avatar Apr 05 '23 15:04 LucaMarconato

I started working on this in https://github.com/scverse/spatialdata/pull/468, we should

  • [ ] call this function before writing and after reading, and maybe somewhere else.

LucaMarconato avatar Feb 23 '24 01:02 LucaMarconato