spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

Improvement to aggregation (`by_key`)

Open LucaMarconato opened this issue 2 years ago • 0 comments

In the Xenium + Visium notebook 01

This block of code, that performs an aggregation of cell types (cells) by visium circles (with fractions=True) could be simplified. Here we aggregate not into individual Visium circles, but into the area given by all Visium circles that have a categorical variable corresponding to a value (e.g. clone 1).

cell_types_categories = xe_rep1_roi_sdata.table.obs["celltype_major"].cat.categories.tolist()

rois_fractions = {}
for row in landmarks_sdata["rois"].iterrows():
    name = row[1][-1]
    cells_inside = cells_in_rois_sdata_rep1[f"Shapes in ROI '{name}'"]
    indices_rep1 = cells_inside.index.tolist()
    corresponding_rows_mask = xe_rep1_roi_sdata.table.obs["cell_id"].isin(indices_rep1)
    corresponding_rows = xe_rep1_roi_sdata.table[corresponding_rows_mask]
    cell_types = corresponding_rows.obs["celltype_major"]
    empty = pd.Series(index=cell_types_categories, data=np.zeros(len(cell_types_categories), dtype=float))
    counts = cell_types.value_counts()
    empty.loc[counts.index] = counts
    rois_fractions[name] = empty
df1_rois = pd.DataFrame(rois_fractions).transpose()

Now the aggregation APIs are robusts enough to considering adding a by_key parameter that groups shapes in by by the value of a categorical column of the by spatial element, and then performs the aggregation.

LucaMarconato avatar Jun 20 '23 22:06 LucaMarconato