spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

`concatenate` does not account for custom region keys in the Table

Open srivarra opened this issue 1 year ago • 2 comments

When concatenating multiple tables from spatialdata.aggregrate, an error is thrown when using a custom region key.

Reproducible example:

import pathlib
import spatialdata as sd
import natsort as ns

mibitof_zarr_path = pathlib.Path("~/Downloads/data.zarr").expanduser()


mibitof_sdata = sd.read_zarr(mibitof_zarr_path)
mibitof_sdata

# Delete the table
del mibitof_sdata.table

images = ns.natsorted(list(mibitof_sdata.images.keys()))
labels = ns.natsorted(list(mibitof_sdata.labels.keys()))
coords = ns.natsorted(list(mibitof_sdata.coordinate_systems))

agg_sd = []

for image, label, coord in zip(images, labels, coords):
    agg_sd.append(
        sd.aggregate(
            values=image,
            by=label,
            values_sdata=mibitof_sdata,
            by_sdata=mibitof_sdata,
            agg_func="sum",
            region_key="fov_id",
            instance_key="cell_id",
            target_coordinate_system=coord,
        )
    )

sd.concatenate(agg_sd, region_key="fov_id", instance_key="cell_id")
Traceback
KeyError                                  Traceback (most recent call last)
File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/core/indexes/base.py:3653](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/core/indexes/base.py:3653), in Index.get_loc(self, key)
 3652 try:
-> 3653     return self._engine.get_loc(casted_key)
 3654 except KeyError as err:

File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/_libs/index.pyx:147](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/_libs/index.pyx:147), in pandas._libs.index.IndexEngine.get_loc()

File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/_libs/index.pyx:176](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/_libs/index.pyx:176), in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'region'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[25], line 1
----> 1 sd.concatenate(agg_sd, region_key="fov_id", instance_key="cell_id")

File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/spatialdata/_core/concatenate.py:115](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/spatialdata/_core/concatenate.py:115), in concatenate(sdatas, region_key, instance_key, **kwargs)
  112 assert type(sdatas) == list, "sdatas must be a list"
  113 assert len(sdatas) > 0, "sdatas must be a non-empty list"
--> 115 merged_table = _concatenate_tables(
  116     [sdata.table for sdata in sdatas if sdata.table is not None], region_key, instance_key, **kwargs
  117 )
  119 return SpatialData(
  120     images=merged_images,
  121     labels=merged_labels,
 (...)
  124     table=merged_table,
  125 )

File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/spatialdata/_core/concatenate.py:63](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/spatialdata/_core/concatenate.py:63), in _concatenate_tables(tables, region_key, instance_key, **kwargs)
   59     tables_l.append(table)
   61 merged_table = ad.concat(tables_l, **kwargs)
   62 attrs = {
---> 63     TableModel.REGION_KEY: merged_table.obs[TableModel.REGION_KEY].unique().tolist(),
   64     TableModel.REGION_KEY_KEY: region_key,
   65     TableModel.INSTANCE_KEY: instance_key,
   66 }
   67 merged_table.uns[TableModel.ATTRS_KEY] = attrs
   69 return TableModel().validate(merged_table)

File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/core/frame.py:3761](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/core/frame.py:3761), in DataFrame.__getitem__(self, key)
 3759 if self.columns.nlevels > 1:
 3760     return self._getitem_multilevel(key)
-> 3761 indexer = self.columns.get_loc(key)
 3762 if is_integer(indexer):
 3763     indexer = [indexer]

File [~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/core/indexes/base.py:3655](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Dev/Python/ark/notebooks/~/.pyenv/versions/3.11.4/envs/ark-spatial/lib/python3.11/site-packages/pandas/core/indexes/base.py:3655), in Index.get_loc(self, key)
 3653     return self._engine.get_loc(casted_key)
 3654 except KeyError as err:
-> 3655     raise KeyError(key) from err
 3656 except TypeError:
 3657     # If we have a listlike key, _check_indexing_error will raise
 3658     #  InvalidIndexError. Otherwise we fall through and re-raise
 3659     #  the TypeError.
 3660     self._check_indexing_error(key)

KeyError: 'region'

I believe the cause of the issue is here:

https://github.com/scverse/spatialdata/blob/054a7bf3b4cbc70f451a9a124a715b79d30be9e0/src/spatialdata/_core/concatenate.py#L62-L66

on line 63, where the obs is indexed with TableModel.REGION_KEY which is "region" instead of the function parameter region_key, which would be "fov_id" in this case. I'd like to contribute with a fix if this is not the expected behavior.

srivarra avatar Sep 02 '23 04:09 srivarra

@srivarra thank you for reporting and sorry for the late answer, the thesis writing got in between and I missed some GitHub notifications. I will look into this and report back.

LucaMarconato avatar Oct 10 '23 11:10 LucaMarconato

@LucaMarconato No worries, hope your thesis is going well! Let me know if I can contribute with a PR if it's an actual issue!

srivarra avatar Oct 10 '23 17:10 srivarra