BERTopic icon indicating copy to clipboard operation
BERTopic copied to clipboard

Evaluate BERTopic

Open MourouzidouElisa opened this issue 2 years ago • 3 comments

As far as I know BERT has multiple ways to interpret its topics but mainly with visualizations. I am now trying to adjust coherence algorithm and evaluate my model and I was wondering if there is any other metric that we can use like Perplexity, Sparsity etc.

MourouzidouElisa avatar Jul 04 '22 15:07 MourouzidouElisa

In topic modeling, there are a bunch of evaluation metrics that you can choose from. Choosing the metrics that suit your use case typically requires very careful and manual selection. Fortunately, there is a great package called OCTIS that you can use that has implemented many different evaluation metrics, from coherence and diversity to similarity and significance.

Having said that, I would highly advise also using human evaluation as these metrics are not perfect by any means. Inspecting and reviewing the topics yourself in the context of your use case is key!

MaartenGr avatar Jul 04 '22 16:07 MaartenGr

@MaartenGr I am trying to get coherence score of BerTopic model using OCTIS, however, what parameter should be given in place of model_output?

coherence = Coherence(measure='c_v') coh_score = coherence.score(topics)

This throws an error- "list indices must be integers or slices, not str"

sreemoyk avatar Jul 29 '22 12:07 sreemoyk

@sreemoyk I would advise going through the https://github.com/MaartenGr/BERTopic_evaluation repo for examples on how to use OCTIS. Also, make sure that the topics variable matches what is expected according to their documentation.

MaartenGr avatar Jul 31 '22 05:07 MaartenGr

Due to inactivity, I'll be closing this for now. Let me know if you have any other questions related to this and I'll make sure to re-open the issue!

MaartenGr avatar Sep 27 '22 08:09 MaartenGr