Maarten Grootendorst

Results 931 comments of Maarten Grootendorst

@iamsha5q There is indeed currently a bug in `merge_topics`. It will be fixed in the next release but there will be some significant changes to the internal structure so a...

In topic modeling, there are a bunch of evaluation metrics that you can choose from. Choosing the metrics that suit your use case typically requires very careful and manual selection....

@sreemoyk I would advise going through the https://github.com/MaartenGr/BERTopic_evaluation repo for examples on how to use OCTIS. Also, make sure that the `topics` variable matches what is expected according to their...

You can increase the `min_topic_size` parameter to get topics that typically consist of more documents. It depends on the dataset but typically if you have more documents in a topic,...

Thank you for your kind words! Typically, if words are very similar to one another, then it would help to set the `diversity` parameter when training BERTopic. It is a...

Glad to hear that that solution also works for you! The `CountVectorizer` definitely is a great method to further process the documents and get the topic representations that you are...

It might be worthwhile to play around with the different values for `color_threshold`. As a default, it is set to 1, so perhaps setting it to 0.5 or 2 changes...

It is currently not directly possible to remove a specific topic entirely from the model. You could try to merge it with other topics that are not of interest or...

The main difficulty here is that KeyBERT uses quite a different procedure from BERTopic and merging them would require some significant changes to both procedures. Using KeyBERT directly, in place...

The package follows, to a certain extent, sklearn's API in that whenever you use `transform` on a set of documents, it will return the topics in the same order. Let's...