spatialdata
spatialdata copied to clipboard
get_centroids() slow on on labels elements
When trying to get centroids of labels, the function get_centroids is really taking forever, even on small data.
This makes it very difficult to run on real life data.
For example, it takes 3 minutes on a 848x2540 image. I tested with a tqdm in _get_centroids_for_axis to check what was taking time:
Computing centroids along the y axis: 100%|██████████| 848/848 [00:50<00:00, 16.78it/s]
Computing centroids along the x axis: 100%|██████████| 2540/2540 [02:04<00:00, 20.47it/s]
A minimal example to reproduce:
from spatialdata import get_centroids
from spatialdata.datasets import blobs
import time
sdata = blobs()
t = time.process_time()
get_centroids(sdata["blobs_labels"])
print (time.process_time() - t)
That shows around 10 seconds for a 512x512 image.
For comparion, the following code that also computes centroids, takes about 0.01 second:
from spatialdata import get_centroids
from spatialdata.datasets import blobs
import time
sdata = blobs()
import numpy as np
from skimage.measure import regionprops
import pandas as pd
t = time.process_time()
label_image = sdata["blobs_labels"].values.T
props = regionprops(label_image)
centroids = pd.DataFrame(
[p.centroid for p in props],
columns=["x", "y"],
index=[p.label for p in props]
)
print (time.process_time() - t)
As get_centroids() is used in to_legacy_anndata, it makes it also really slow on large datasets.