Maarten Grootendorst
Maarten Grootendorst
From your code, I think the culprit here is `nr_bins=10`. Since you have documents that range from 2000 to 2021, which are 21 years, binning them into 10 bins (years)...
@pariskang Thank you for the suggestion. BERTopic creates, and visualizes, often quite a number of topics with different kinds of distributions. I have tested it before with the area map...
Thank you for the extensive description! This is indeed a known issue and has to do with how the topics are accessed. It will be fixed in the next release...
There are ways of speeding up UMAP, which you can find in the documentation [here](https://maartengr.github.io/BERTopic/getting_started/tips_and_tricks/tips_and_tricks.html#speed-up-umap). Also, you can also use `cuml` to GPU-accelerate both HDBSCAN and UMAP which could significantly...
Thank you for sharing this issue. From what I can see, there might be an issue with the way `merge_topics` is currently working but I cannot be sure. Can you...
I just checked the code of `merge_topics` and I believe I understand the issue here. It seems that the topics are not properly updated across some of the functions. It...
Based on your error, it seems that installing BERTopic in your environment did not go as expected. Did you run `!pip install bertopic` in your Google Colab environment? Also, did...
The main reason for this is modularity. Although HDBSCAN is the default model, other clustering algorithms can be used instead, such as k-Means. In order to support any clustering technique,...
Thanks for mentioning this! It is indeed a documentation issue as the barcharts show the topic representations directly through the bars. However, it might indeed make sense to add the...
You are correct that there is not a function that calculates the distance between each document and the topic it belongs to. This is because that is not the procedure...