John Healy

Results 18 comments of John Healy

Have you looked at the condensed tree plots or the single_linkage_tree plots? Those would be the places that I'd start with to try and explore what is happening. https://hdbscan.readthedocs.io/en/latest/advanced_hdbscan.html On...

Thanks for adding that plot. It confirms that there was indeed a single cluster with a single label. I know that is what your legend indicated but your results looked...

That is a bit strange. I tried to reproduce your error and wasn't able to do it in python 3.8 on a fresh install of hdbscan on a macbook. For...

In order to do an apples to apples comparison you need to check that both algorithms are returning similar results. Dbscan can be blazingly fast if you are taking very...

I agree with Leland that duplicates ending in different clusters shouldn't happen. It might be worth double checking that your duplicate text points are being vectorized to the same point....

Awesome, I'm happy to hear that you'd already checked that potential problem. Of course that does mean we've got some particularly odd behaviour going on. Any chance that you could...

Thanks for the code example, they always make things easier. That particular example looks like a problem with passing hdbscan a single cluster of data, and not the fact that...

We are always happy to receive pull requests though I must admit it can be challenging to find the time to review them all. On Thu, Oct 7, 2021 at...

AlignedUMAP attempts to embed two data sets into a joint space through regularization. You need two datasets with some subset (maybe all) of your points present in both datasets. Then...

I definitely agree with David's idea of sensitivity instead of importance. I see that you are suggesting using the AlignedUMAP in 0.5.0 to reduce the embedding noise due to the...