Maarten Grootendorst
Maarten Grootendorst
Thank you for your kind words! Glad to hear that the libraries are helpful to you. This is indeed troublesome. I am not entirely sure but I do believe some...
@kkadu There is currently an issue with `merge_topics` that does not fully update the topics in the model. A fix is coming for that but that might take a while....
The difficulty with hyperparameter tuning is that you typically need a ground truth and an objective evaluation metric for it to properly work. Due to the somewhat subjective nature of...
Thank you for your kind words! Scalability can definitely be an issue when handling a million documents. Specifically for that reason, I created an [FAQ](https://maartengr.github.io/BERTopic/faq.html#i-am-facing-memory-issues-help) page that has a bunch...
No problem! Please feel free to post any questions or concerns you have even if they might already be mentioned somewhere else. It might happen that your use case is...
That is indeed quite difficult! Although I agree that it would be nicer to make `documents` optional in your use case, I am hesistant about both `documents` and `embeddings` being...
Hmmm, you are right that the cleanest way would be one argument to replace them both, and like you said that might break some stuff. Also, I like separating them...
Although that is not something currently supported in BERTopic, you could take the words in a topic and search them throughout the documents that belong to that topic. You could...
@wang2hhhhh When you run the following `topics, probs = topic_model.fit_transform(docs)`, then each topic in `topics` belongs to a document in `docs`. The order is kept, which means that `topics[0]` is...
Apologies for the late reply, life is hectic lately! Semi-supervised learning works by nudging the topic creation towards those that you have defined previously. In practice, that will not mean...