umap icon indicating copy to clipboard operation
umap copied to clipboard

How to optimize UMAP parameters for UMAP enhanced HDBSCAN clustering on 1536 dimension embeddings

Open BharathiSubramanian-Bain opened this issue 8 months ago • 1 comments

Hello,

I have a huge dataset of thousands of rows and 1536 dimensions (columns). I would like to find out the natural clusters in this data but, HDBSCAN performs very badly on high dimensions. When I try UMAP, the results are significantly better but, I don't know how to choose the optimal parameters for both UMAP and HDBSCAN, kindly share your thoughts.

I'd recommend starting with the advice on UMAP for clustering in the documentation: https://umap-learn.readthedocs.io/en/latest/clustering.html#umap-enhanced-clustering

jacobgolding avatar Apr 30 '25 05:04 jacobgolding