Robert E. Hatem comments

Results 5 comments of


                                            Robert E. Hatem

[FEA] Support for HDBSCAN membership_vector and all_points_membership_vectors

Thanks for implementing this feature! Has anyone had issues using `all_points_membership_vectors` with a large dataset? (for me, original embeddings are 235002 x 384). It causes my Python kernel to fail....

[FEA] Support for HDBSCAN membership_vector and all_points_membership_vectors

g4dn.8xlarge ![image](https://user-images.githubusercontent.com/15951139/210452902-f21d626b-7505-41d9-8fac-973cb73cd92a.png) ![image](https://user-images.githubusercontent.com/15951139/210454173-34d53f20-0b84-4e7f-b52a-5f5b8f649af8.png) ![image](https://user-images.githubusercontent.com/15951139/210452805-c8174d8c-bbdd-4656-a4eb-80dd9c135f99.png)

[FEA] Support for HDBSCAN membership_vector and all_points_membership_vectors

> @hatemr I tried the BERTopic with GPU for almost 1M documents. My original embeddings are 1M x 384. I tried to get the probabilities of each topic for evary...

[FEA] Reduce memory pressure in HDBSCAN all_points_membership_vectors (or provide/link to best practices)

``` HDBSCAN(min_cluster_size=20, metric='euclidean', cluster_selection_method='eom', prediction_data=True) ``` Thanks, I'll try adjusting `min_samples`

[FEA] Reduce memory pressure in HDBSCAN all_points_membership_vectors (or provide/link to best practices)

> ``` > HDBSCAN(min_cluster_size=20, > metric='euclidean', > cluster_selection_method='eom', > prediction_data=True) > ``` > > Thanks, I'll try adjusting `min_samples` Increasing `min_samples` worked! I had to increase `min_samples` itself - increasing...