spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

Faster implementation available for `vectorize()`

Open LucaMarconato opened this issue 1 year ago • 4 comments

During the Ghent Hackathon (BioHackrXiv here) a faster implementation for vectorize() has been developed by @hey2homie: https://github.com/saeyslab/VIB_Hackathon_June_2024/blob/main/polygons/polygons_test.ipynb.

The implementation also fixes https://github.com/scverse/spatialdata/issues/583, but relies on opencv, which is a heavy dependency.

Still, we should see if that code could help improving the performance.

LucaMarconato avatar Jul 11 '24 14:07 LucaMarconato

@hey2homie I won't have the time to check your approach soon, but if you think that the same approach could be used without using opencv and are willing to open a PR, your contribution would be very welcome 😊

LucaMarconato avatar Jul 11 '24 14:07 LucaMarconato

@LucaMarconato, thanks for reaching out! Was a bit busy lately to follow-up on this. There were some minor issues in the code from the hackathon, but I've already fixed them. I will work a bit more on this to polish and submit PR.

By the way, the speed bust most like coming from not using chunking with dask arrays but simply using numpy as both opencv and skimage have almost identical performance. So, abandoning opencv in favor to skimage wouldn't be a problem!

hey2homie avatar Jul 19 '24 07:07 hey2homie

Thank you for the explanation, and thank you in advance for the PR; it is very appreciated!

LucaMarconato avatar Jul 21 '24 15:07 LucaMarconato

is this related to #560 ?

giovp avatar Sep 06 '24 21:09 giovp

Yes, #560 was the original implementation but a faster one has been developed from @hey2homie during the Ghent hackathon.

LucaMarconato avatar Sep 30 '24 18:09 LucaMarconato