Problem with indices with `left_exclusive` join
To reproduce
While working on some tests for https://github.com/scverse/spatialdata/pull/822, I discovered a bug with left_exclusive (unrelated to the bug addressed in the mentioned PR).
To reproduce please add the left_exclusive string in the pytest.mark.parameterize in test_inner_join_match_rows_duplicate_obs_indices() (test_relational_query.py).
The bug it's unrelated to the one fixed by the mentioned PR because the issue appears also if we comment this line from the test:
sdata["table"].obs.index = ["a"] * sdata["table"].n_obs
The error I get is the following: @melonora could you please have a look at it?
Traceback
tests/core/query/test_relational_query.py:380 (test_inner_join_match_rows_duplicate_obs_indices[left_exclusive])
sdata_query_aggregation = SpatialData object
├── Points
│ └── 'points': DataFrame with shape: (<Delayed>, 5) (2D points)
├── Shapes
│ ├─...:
points (Points), by_circles (Shapes), by_polygons (Shapes), values_circles (Shapes), values_polygons (Shapes)
join_type = 'left_exclusive'
@pytest.mark.parametrize('join_type', ['left', 'right', 'inner', 'right_exclusive', 'left_exclusive'])
def test_inner_join_match_rows_duplicate_obs_indices(sdata_query_aggregation: SpatialData, join_type: str) -> None:
sdata = sdata_query_aggregation
# sdata["table"].obs.index = ["a"] * sdata["table"].n_obs
sdata["values_circles"] = sdata_query_aggregation["values_circles"][:4]
sdata["values_polygons"] = sdata_query_aggregation["values_polygons"][:5]
> element_dict, table = join_spatialelement_table(
sdata=sdata,
spatial_element_names=["values_circles", "values_polygons"],
table_name="table",
how=join_type,
)
test_relational_query.py:388:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../src/spatialdata/_core/query/relational_query.py:680: in join_spatialelement_table
elements_dict_joined, table = _call_join(elements_dict, table, how, match_rows)
../../../src/spatialdata/_core/query/relational_query.py:697: in _call_join
elements_dict, table = JoinTypes[how](elements_dict, table, match_rows)
../../../src/spatialdata/_core/query/relational_query.py:528: in __call__
return self.value(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
element_dict = defaultdict(<function _create_sdata_elements_dict_for_join.<locals>.<lambda> at 0x3087c3c40>, {'shapes': defaultdict(<...)) ... 3
4 POLYGON ((66 -6, 66 6, 78 6, 78 -6, 66 -6)) ... 4
[5 rows x 3 columns]})})
table = AnnData object with n_obs × n_vars = 21 × 1
obs: 'region', 'instance_id', 'categorical_in_obs', 'numerical_in_obs'
uns: 'spatialdata_attrs'
match_rows = 'no'
def _left_exclusive_join_spatialelement_table(
element_dict: dict[str, dict[str, Any]], table: AnnData, match_rows: Literal["left", "no", "right"]
) -> tuple[dict[str, Any], AnnData | None]:
regions, region_column_name, instance_key = get_table_keys(table)
groups_df = table.obs.groupby(by=region_column_name, observed=False)
for element_type, name_element in element_dict.items():
for name, element in name_element.items():
if name in regions:
group_df = groups_df.get_group(name)
table_instance_key_column = group_df[instance_key]
if element_type in ["points", "shapes"]:
mask = np.full(len(element), True, dtype=bool)
> mask[table_instance_key_column.values] = False
E IndexError: index 4 is out of bounds for axis 0 with size 4
../../../src/spatialdata/_core/query/relational_query.py:435: IndexError
Process finished with exit code 1
I would actually use this occasion to simplify the code with join operations. There is quite some redundant code and currently the join operations are difficult to maintain because of that.