BERTopic Get representative doc per Topic with other columns like rating, date of document

Get representative doc per Topic with other columns like rating, date of document

Open amrityap opened this issue 2 years ago • 1 comments

Hey Maarten, I was running BERTopic on user reviews of an app. My goal is to perform sentiment analysis on reviews per topic. I managed to get topics. But now I need to print the reviews per topic along with their sentiment label (1 or 0). topic_model.get_representative_docs() only print the reviews with their topic. Is there a way to keep other columns like sentiment label and star rating so I can perform sentiment analysis per topic?

Aug 01 '22 15:08 amrityap

The package follows, to a certain extent, sklearn's API in that whenever you use transform on a set of documents, it will return the topics in the same order. Let's say you have the following code:

from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups

docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']

topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs)

Here, docs is a list of documents on which you train the model. Running .fit_transform(docs) will return the variable topics. In topics, you will find the topics that belong to each documents. The topic in topics[0] corresponds to the document in docs[0], topics[1] to docs[1], etc.

You can use that structure to extract the documents under a certain topic by using, for example, the following:

import pandas as pd
results = pd.DataFrame({"Doc": docs, "Topic": topics})

The results variable can then be extended with whatever metadata you have, like sentiment label and star rating.

Aug 02 '22 05:08 MaartenGr

Due to inactivity, I'll be closing this for now. Let me know if you have any other questions related to this and I'll make sure to re-open the issue!

Sep 27 '22 08:09 MaartenGr

BERTopic BERTopic copied to clipboard

Get representative doc per Topic with other columns like rating, date of document

BERTopic
BERTopic copied to clipboard