spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

Faster version `aggregate()` method available

Open LucaMarconato opened this issue 1 year ago • 2 comments

While extending SOPA for Visium HD data, @quentinblampey encountered a performance bottleneck with aggregate() that he could improve using pure geopandas code. Since we use geopandas internally, this could be a bug of spatialdata that may be easy to fix.

Here is the SOPA code https://github.com/gustaveroussy/sopa/blob/master/sopa/segmentation/aggregation.py#L485.

LucaMarconato avatar Oct 31 '24 17:10 LucaMarconato

See Harpy aggregate implementation as mentioned by @ArneDefauw in #677.

berombau avatar Nov 06 '24 09:11 berombau

See Harpy aggregate implementation as mentioned by @ArneDefauw in #677.

Just a side note, https://github.com/saeyslab/harpy/blob/6b80d01baa11c0ee9ecdfb48d5b0d72be305cb2e/src/sparrow/table/_allocation_intensity.py#L22 which uses https://github.com/saeyslab/harpy/blob/6b80d01baa11c0ee9ecdfb48d5b0d72be305cb2e/src/sparrow/utils/_aggregate.py#L16, which is more general, provides support for aggregation between labels layer and image layers, similar to xr_spatial.zonal_stats, but faster, and with support for custom aggregations https://github.com/saeyslab/harpy/blob/6b80d01baa11c0ee9ecdfb48d5b0d72be305cb2e/src/sparrow/utils/_aggregate.py#L251

I think https://github.com/gustaveroussy/sopa/blob/f1f5a99ee7f5a9489e511241a3a62bb520ec9860/sopa/segmentation/aggregation.py#L485 , focuses on aggregation between shapes layers and bins

ArneDefauw avatar Nov 06 '24 12:11 ArneDefauw