BERTopic icon indicating copy to clipboard operation
BERTopic copied to clipboard

How to get the respective topics, the name of each topic, the top n words of each topic and other data for news docs on which `transform()` is used?

Open yugkha3 opened this issue 5 months ago • 7 comments

Suppose I trained the model first and got the topics, representative docs etc of the training docs using .get_document_info():

topic_model = BERTopic(vectorizer_model=vectorizer_model, hdbscan_model=hdbscan_model, embedding_model=embedding_model)
topics, probs = topic_model.fit_transform(docs)
print(topic_model.get_document_info())

and now I am predicting topics over new_docs:

new_topics, new_probs = topic_model.transform(new_docs)

now how do I get the information which new_doc falls in which new_topic? Like how can I generate a list/df just same as .get_document_info() for the newer docs and its new topics?

yugkha3 avatar Feb 22 '24 15:02 yugkha3