Maarten Grootendorst comments

Results 931 comments of


                                            Maarten Grootendorst

Issue with supervised topic modeling approach to predict new documents

Thank you for your kind words! Glad to hear that the libraries are helpful to you. This is indeed troublesome. I am not entirely sure but I do believe some...

Issue with supervised topic modeling approach to predict new documents

@kkadu There is currently an issue with `merge_topics` that does not fully update the topics in the model. A fix is coming for that but that might take a while....

hyperparameter tuning

The difficulty with hyperparameter tuning is that you typically need a ground truth and an objective evaluation metric for it to properly work. Due to the somewhat subjective nature of...

Memory error with ~1m documents (no GPU available, low_memory=True)

Thank you for your kind words! Scalability can definitely be an issue when handling a million documents. Specifically for that reason, I created an [FAQ](https://maartengr.github.io/BERTopic/faq.html#i-am-facing-memory-issues-help) page that has a bunch...

Memory error with ~1m documents (no GPU available, low_memory=True)

No problem! Please feel free to post any questions or concerns you have even if they might already be mentioned somewhere else. It might happen that your use case is...

Suggestion: make documents optional for BERTopic.transform()

That is indeed quite difficult! Although I agree that it would be nicer to make `documents` optional in your use case, I am hesistant about both `documents` and `embeddings` being...

Suggestion: make documents optional for BERTopic.transform()

Hmmm, you are right that the cleanest way would be one argument to replace them both, and like you said that might break some stuff. Also, I like separating them...

Locating to the original sentence of the resulting words?

Although that is not something currently supported in BERTopic, you could take the words in a topic and search them throughout the documents that belong to that topic. You could...

Locating to the original sentence of the resulting words?

@wang2hhhhh When you run the following `topics, probs = topic_model.fit_transform(docs)`, then each topic in `topics` belongs to a document in `docs`. The order is kept, which means that `topics[0]` is...

Semi supervised learning with a sample of data manually labeled

Apologies for the late reply, life is hectic lately! Semi-supervised learning works by nudging the topic creation towards those that you have defined previously. In practice, that will not mean...