spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

get_centroids() slow on on labels elements

Open cavenel opened this issue 7 months ago • 0 comments

When trying to get centroids of labels, the function get_centroids is really taking forever, even on small data. This makes it very difficult to run on real life data.

For example, it takes 3 minutes on a 848x2540 image. I tested with a tqdm in _get_centroids_for_axis to check what was taking time:

Computing centroids along the y axis: 100%|██████████| 848/848 [00:50<00:00, 16.78it/s]
Computing centroids along the x axis: 100%|██████████| 2540/2540 [02:04<00:00, 20.47it/s]

A minimal example to reproduce:

from spatialdata import get_centroids
from spatialdata.datasets import blobs
import time

sdata = blobs()

t = time.process_time()
get_centroids(sdata["blobs_labels"])
print (time.process_time() - t)

That shows around 10 seconds for a 512x512 image.

For comparion, the following code that also computes centroids, takes about 0.01 second:

from spatialdata import get_centroids
from spatialdata.datasets import blobs
import time

sdata = blobs()

import numpy as np
from skimage.measure import regionprops
import pandas as pd

t = time.process_time()
label_image = sdata["blobs_labels"].values.T
props = regionprops(label_image)
centroids = pd.DataFrame(
    [p.centroid for p in props],
    columns=["x", "y"],
    index=[p.label for p in props]
)
print (time.process_time() - t)

As get_centroids() is used in to_legacy_anndata, it makes it also really slow on large datasets.

cavenel avatar May 19 '25 15:05 cavenel