spatialdata
spatialdata copied to clipboard
`concatenate` does not account for custom region keys in the Table
When concatenating multiple tables from spatialdata.aggregrate
, an error is thrown when using a custom region key.
Reproducible example:
import pathlib
import spatialdata as sd
import natsort as ns
mibitof_zarr_path = pathlib.Path("~/Downloads/data.zarr").expanduser()
mibitof_sdata = sd.read_zarr(mibitof_zarr_path)
mibitof_sdata
# Delete the table
del mibitof_sdata.table
images = ns.natsorted(list(mibitof_sdata.images.keys()))
labels = ns.natsorted(list(mibitof_sdata.labels.keys()))
coords = ns.natsorted(list(mibitof_sdata.coordinate_systems))
agg_sd = []
for image, label, coord in zip(images, labels, coords):
agg_sd.append(
sd.aggregate(
values=image,
by=label,
values_sdata=mibitof_sdata,
by_sdata=mibitof_sdata,
agg_func="sum",
region_key="fov_id",
instance_key="cell_id",
target_coordinate_system=coord,
)
)
sd.concatenate(agg_sd, region_key="fov_id", instance_key="cell_id")
Traceback
KeyError Traceback (most recent call last)
File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/core/indexes/base.py:3653](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/core/indexes/base.py:3653), in Index.get_loc(self, key)
3652 try:
-> 3653 return self._engine.get_loc(casted_key)
3654 except KeyError as err:
File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/_libs/index.pyx:147](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/_libs/index.pyx:147), in pandas._libs.index.IndexEngine.get_loc()
File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/_libs/index.pyx:176](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/_libs/index.pyx:176), in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'region'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[25], line 1
----> 1 sd.concatenate(agg_sd, region_key="fov_id", instance_key="cell_id")
File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/spatialdata/_core/concatenate.py:115](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/spatialdata/_core/concatenate.py:115), in concatenate(sdatas, region_key, instance_key, **kwargs)
112 assert type(sdatas) == list, "sdatas must be a list"
113 assert len(sdatas) > 0, "sdatas must be a non-empty list"
--> 115 merged_table = _concatenate_tables(
116 [sdata.table for sdata in sdatas if sdata.table is not None], region_key, instance_key, **kwargs
117 )
119 return SpatialData(
120 images=merged_images,
121 labels=merged_labels,
(...)
124 table=merged_table,
125 )
File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/spatialdata/_core/concatenate.py:63](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/spatialdata/_core/concatenate.py:63), in _concatenate_tables(tables, region_key, instance_key, **kwargs)
59 tables_l.append(table)
61 merged_table = ad.concat(tables_l, **kwargs)
62 attrs = {
---> 63 TableModel.REGION_KEY: merged_table.obs[TableModel.REGION_KEY].unique().tolist(),
64 TableModel.REGION_KEY_KEY: region_key,
65 TableModel.INSTANCE_KEY: instance_key,
66 }
67 merged_table.uns[TableModel.ATTRS_KEY] = attrs
69 return TableModel().validate(merged_table)
File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/core/frame.py:3761](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/core/frame.py:3761), in DataFrame.__getitem__(self, key)
3759 if self.columns.nlevels > 1:
3760 return self._getitem_multilevel(key)
-> 3761 indexer = self.columns.get_loc(key)
3762 if is_integer(indexer):
3763 indexer = [indexer]
File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/core/indexes/base.py:3655](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/core/indexes/base.py:3655), in Index.get_loc(self, key)
3653 return self._engine.get_loc(casted_key)
3654 except KeyError as err:
-> 3655 raise KeyError(key) from err
3656 except TypeError:
3657 # If we have a listlike key, _check_indexing_error will raise
3658 # InvalidIndexError. Otherwise we fall through and re-raise
3659 # the TypeError.
3660 self._check_indexing_error(key)
KeyError: 'region'
I believe the cause of the issue is here:
https://github.com/scverse/spatialdata/blob/054a7bf3b4cbc70f451a9a124a715b79d30be9e0/src/spatialdata/_core/concatenate.py#L62-L66
on line 63, where the obs
is indexed with TableModel.REGION_KEY
which is "region"
instead of the function parameter region_key
, which would be "fov_id"
in this case. I'd like to contribute with a fix if this is not the expected behavior.
@srivarra thank you for reporting and sorry for the late answer, the thesis writing got in between and I missed some GitHub notifications. I will look into this and report back.
@LucaMarconato No worries, hope your thesis is going well! Let me know if I can contribute with a PR if it's an actual issue!