hdbscan icon indicating copy to clipboard operation
hdbscan copied to clipboard

Slow behavior of approximate_predict()

Open ogutty opened this issue 4 years ago • 3 comments

Hi, I have a dataset of ~300k (4d) vectors which I've been clustering using your algorithm. Training a clusterer for this dataset normally takes ~20sec, while training with the prediction_data=True keyword takes ~90sec. However, when I try using approximate_predict() on a new set of ~30k vectors, the operation takes some 330 seconds. Is this running time considered "normal"? (BTW, my application requires a much faster run times.)

The running times for this function are not documented, but the comments describing the operation hint at much quicker behavior. I'd be glad to hear any clarifications and thanks for maintaining this impressive library!

ogutty avatar Aug 19 '21 07:08 ogutty

@ogutty Have you cam up with some solution? like trying to multiprocess approximate_predict()? HDBSCAN only uses one core when calling approximate_predict()

yotammarton avatar Dec 22 '21 07:12 yotammarton

Unfortunately, I haven't (this was a low priority item in a colleague's project). I suppose your idea could help, but on the face of it even a 16X speedup still wouldn't improve the running time (from ~20sec originally to 330 / 16 ~= 20sec). I was hoping someone with a better understanding of the algorithms' inner workings would see some "obvious" solution. I'm still waiting.

ogutty avatar Dec 22 '21 07:12 ogutty

Approximate prediction is really geared more toward a streaming situation. The performance described is a little disconcerting, but I don't know of any obvious reasons for it. It is, admittedly, designed more around a streaming case (with smaller numbers of points being predicted) so it may be bottlenecking somewhere. Most likely it would be in the nearest neighbour computation stage which could potentially be made multi-core. It would take some looking into which I don't have the time to manage right now. I would be happy to support anyone else's efforts to dig in however.

lmcinnes avatar Dec 29 '21 17:12 lmcinnes