spatialdata allow filtering by ids

closes #556

This PR allows for filtering elements (points, shapes, tables) on instance IDs. For labels I decided just as with the joins to not support it as it is potentially an expensive operation, where I am not certain what the usecase would be (one can adjust the color to a background color in any case for particular labels or even set it transparent).

The PR is related to #626, however I think it is better to separate out filtering on instance IDs. The function here can ultimately still be used in his function.

The PR will not cause an error in case of filtering on instance IDs that are not there. Furthermore it is not a problem to filter on instances in a table that is annotating more than 1 element. If instances for both annotated elements need to be filtered in the table this will be retrieved. If region_names is defined, a view of the table with rows annotating elements in region_names is first created before filtering on instance IDs.

After filtering of the table, the metadata for annotation of regions is checked and updated after which the table is validated.

Jul 09 '24 10:07 melonora

Codecov Report

Attention: Patch coverage is 93.93939% with 2 lines in your changes missing coverage. Please review.

Project coverage is 92.11%. Comparing base (5e2b1a4) to head (f521f1f). Report is 10 commits behind head on main.

Files with missing lines	Patch %	Lines
src/spatialdata/models/models.py	90.90%	2 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #627      +/-   ##
==========================================
+ Coverage   92.00%   92.11%   +0.10%     
==========================================
  Files          48       48              
  Lines        7408     7429      +21     
==========================================
+ Hits         6816     6843      +27     
+ Misses        592      586       -6

Files with missing lines	Coverage Δ
src/spatialdata/__init__.py	`96.42% <ø> (ø)`
src/spatialdata/_core/query/relational_query.py	`91.14% <100.00%> (+0.15%)`	:arrow_up:
src/spatialdata/_core/spatialdata.py	`91.46% <100.00%> (+0.20%)`	:arrow_up:
src/spatialdata/models/models.py	`88.34% <90.90%> (+1.09%)`	:arrow_up:

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Jul 09 '24 10:07 codecov[bot]

@melonora converting to draft, what's missing for the moment, just

[x] the tests?

Jul 12 '24 17:07 LucaMarconato

[x] ~~ah also the function should be private so it can be used internally by a single "entry point" filter().~~ made the function public and removed filter().

Jul 12 '24 17:07 LucaMarconato

Hi, I reviewed the PR. First please check my comment in this linked PR: https://github.com/scverse/spatialdata/pull/626#issuecomment-2615467570.

In this PR, I refined the function described in the comment linked above, into a new public API match_sdata_to_table(). As shown in the tests, all the use cases of the linked PR are supported. The use cases of filtering by instance_ids from points/shapes are supported by the functions match_table_to_element() (or by the join operations).

Since the purpose of the current PR was to add a private function that could support the filter() method of the linked PR, and since now the new public function allows for this. I removed the filter() method.

This is supported also by the following reasons:

the filter() function was supposed to be private, but now we do not need such private function
the method introduces code redundancy that can be avoided by using the join APIs (this is how I implemented match_sdata_to_table()

Finally, since two comments that I had from the review of the filter() method (even if not relevant anymore if the method is removed).

the points were not filtered by the index, but by the instance_key column, whose role is to match the transcript to a cell index to which the transcript below to. I had fixed this in 122aa5f (#627). Note that the semantic of instance_key column for points is not well defined as reported here https://github.com/scverse/spatialdata/issues/217.
the old API (with the parameter element_names) did not allow to specify particular instances for particular elements. For instance given two segmentation masks (shapes) "shapes0" and "shapes1" annotated by a table. It was not possible to select the cells [0, 1] for "shapes0", but only the cell [0] for "shapes1". The new implementation allows for these stratified types of filters.

Feb 02 '25 16:02 LucaMarconato

Ready for review! I tag both @melonora and @aeisenbarth please.

In this PR I also:

improved the TableModel.parse() function.
- Some lines were never reached from the tests, as codecov shows. I fixed this
- I added a new parameter overwrite_metadata: bool that can be used to conveniently replace region, region_key, instance_key. I use this match_sdata_to_table().
I rewrote the method SpatialData.get_annotated_regions(), which I use in match_sdata_to_table(), to compute the regions that are actually annotated by the table instead of simply returning the region metadata. This is a first step in the direction of dropping region, as explained here https://github.com/scverse/spatialdata/issues/629.

Feb 02 '25 16:02 LucaMarconato