RR28 comments

Results 11 comments of


                                            RR28

Not able to get probabilities with cuml HDBSCAN

I am also using cuml version of BERTopic and sadly, not getting probabilities. Is there any way to extract more than 30 words on the topic? As I have to...

Not able to get probabilities with cuml HDBSCAN

> > Is there any way to extract more than 30 words on the topic? > > In the latest version of BERTopic, you can access the c-TF-IDF matrix with...

Problems with merging topics

Hi MaartenGr, I want to apologize for asking numerous questions. I again have a question, I want to merge topics from two different datasets using the code below from Tips...

> @rubypnchl Merging topics from two different models is currently not possible. If you follow along with the description of [BERTopic's algorithm](https://maartengr.github.io/BERTopic/algorithm/algorithm.html) this becomes quickly clear. Namely, we would have...

min_topic_size is not impacting the number of topics in Latest Version Version 0.12.0

> Could you share your code for training BERTopic? The code would help me identify where the issue might stem from. The parameter you refer to was not changed between...

min_topic_size is not impacting the number of topics in Latest Version Version 0.12.0

> When you use `min_topic_size` you are essentially setting the `min_cluster_size` parameter in HDBSCAN. So if you are using a custom HDBSCAN model, `min_topic_size` is not used and replaced by...

> > import numpy as np

Thank you for quick response! > > Thank you for the great work, I currently using BERTopic for one of my problem. I am facing issues while updating the topics...

> > import numpy as np

> > new_topics = [np.argmax(prob) if max(prob) >= probability_threshold else -1 for prob in probs] > > topic_model.update_topics(abstracts, new_topics, vectorizer_model=vectorizer_model) > > It might indeed be the case here that...

> > import numpy as np

> > My main aim is to get no outlier or least outlier but with good quality of topics, > > If you want minimal or no outliers, then I...

Memory Issues

> I encountered the same issue with BERTopic using a large dataset. The way BERTopic uses the vectorizer somehow results in huge memory consumption. I suspect the reason for this...