allow filtering by ids
closes #556
This PR allows for filtering elements (points, shapes, tables) on instance IDs. For labels I decided just as with the joins to not support it as it is potentially an expensive operation, where I am not certain what the usecase would be (one can adjust the color to a background color in any case for particular labels or even set it transparent).
The PR is related to #626, however I think it is better to separate out filtering on instance IDs. The function here can ultimately still be used in his function.
The PR will not cause an error in case of filtering on instance IDs that are not there. Furthermore it is not a problem to filter on instances in a table that is annotating more than 1 element. If instances for both annotated elements need to be filtered in the table this will be retrieved. If region_names is defined, a view of the table with rows annotating elements in region_names is first created before filtering on instance IDs.
After filtering of the table, the metadata for annotation of regions is checked and updated after which the table is validated.
Codecov Report
Attention: Patch coverage is 93.93939% with 2 lines in your changes missing coverage. Please review.
Project coverage is 92.11%. Comparing base (
5e2b1a4) to head (f521f1f). Report is 10 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/spatialdata/models/models.py | 90.90% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #627 +/- ##
==========================================
+ Coverage 92.00% 92.11% +0.10%
==========================================
Files 48 48
Lines 7408 7429 +21
==========================================
+ Hits 6816 6843 +27
+ Misses 592 586 -6
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/spatialdata/__init__.py | 96.42% <ø> (ø) |
|
| src/spatialdata/_core/query/relational_query.py | 91.14% <100.00%> (+0.15%) |
:arrow_up: |
| src/spatialdata/_core/spatialdata.py | 91.46% <100.00%> (+0.20%) |
:arrow_up: |
| src/spatialdata/models/models.py | 88.34% <90.90%> (+1.09%) |
:arrow_up: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@melonora converting to draft, what's missing for the moment, just
- [x] the tests?
- [x] ~~ah also the function should be private so it can be used internally by a single "entry point"
filter().~~ made the function public and removedfilter().
Hi, I reviewed the PR. First please check my comment in this linked PR: https://github.com/scverse/spatialdata/pull/626#issuecomment-2615467570.
In this PR, I refined the function described in the comment linked above, into a new public API match_sdata_to_table(). As shown in the tests, all the use cases of the linked PR are supported. The use cases of filtering by instance_ids from points/shapes are supported by the functions match_table_to_element() (or by the join operations).
Since the purpose of the current PR was to add a private function that could support the filter() method of the linked PR, and since now the new public function allows for this. I removed the filter() method.
This is supported also by the following reasons:
- the
filter()function was supposed to be private, but now we do not need such private function - the method introduces code redundancy that can be avoided by using the join APIs (this is how I implemented
match_sdata_to_table()
Finally, since two comments that I had from the review of the filter() method (even if not relevant anymore if the method is removed).
- the points were not filtered by the index, but by the
instance_keycolumn, whose role is to match the transcript to a cell index to which the transcript below to. I had fixed this in122aa5f(#627). Note that the semantic ofinstance_keycolumn for points is not well defined as reported here https://github.com/scverse/spatialdata/issues/217. - the old API (with the parameter
element_names) did not allow to specify particular instances for particular elements. For instance given two segmentation masks (shapes) "shapes0" and "shapes1" annotated by a table. It was not possible to select the cells [0, 1] for "shapes0", but only the cell [0] for "shapes1". The new implementation allows for these stratified types of filters.
Ready for review! I tag both @melonora and @aeisenbarth please.
In this PR I also:
- improved the
TableModel.parse()function.- Some lines were never reached from the tests, as codecov shows. I fixed this
- I added a new parameter
overwrite_metadata: boolthat can be used to conveniently replaceregion,region_key,instance_key. I use thismatch_sdata_to_table().
- I rewrote the method
SpatialData.get_annotated_regions(), which I use inmatch_sdata_to_table(), to compute the regions that are actually annotated by the table instead of simply returning theregionmetadata. This is a first step in the direction of droppingregion, as explained here https://github.com/scverse/spatialdata/issues/629.