GuidedLDA
GuidedLDA copied to clipboard
Using pyldavis on top of guidedlda model
Hi,
Is there a way to use pyldavis library on top of the model fit by guidedlda to visualise the topic clusters ?
Thanks.
Hello Jeeva - I just implemented this on a project I was working on. Tested in Anconda notebooks for inline visualisation
[Assuming you have the document-term matrix stored as a pandas dataframe (in my case "tef_dtm"), a guidedLDA model built (in my case named "model"), and the vocab saved as a variable (in my case named "vocab")]
import pyLDAvis
# calculate doc lengths as the sum of each row of the dtm
doc_lengths = tef_dtm.sum(axis=1, skipna=True)
# transpose the dtm and get a sum of the overall term frequency
dtm_trans = tef_dtm.T
dtm_trans['total'] = dtm_trans.sum(axis=1, skipna=True)
# create a data dictionary as per this tutorial https://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/master/notebooks/Movie%20Reviews%2C%20AP%20News%2C%20and%20Jeopardy.ipynb
data = {'topic_term_dists':model.topic_word_, 'doc_topic_dists':model.doc_topic_, 'doc_lengths':doc_lengths, 'vocab':vocab, 'term_frequency':list(dtm_trans['total'])}
# prepare the data
tef_vis_data = pyLDAvis.prepare(**data)
# this bit needs to be run after running the earlier code for reasons
pyLDAvis.display(tef_vis_data)
# save to HTML
pyLDAvis.save_html(tef_vis_data, "LDAvis.html")
Hi
Hello Jeeva - I just implemented this on a project I was working on. Tested in Anconda notebooks for inline visualisation
[Assuming you have the document-term matrix stored as a pandas dataframe (in my case "tef_dtm"), a guidedLDA model built (in my case named "model"), and the vocab saved as a variable (in my case named "vocab")]
import pyLDAvis # calculate doc lengths as the sum of each row of the dtm doc_lengths = tef_dtm.sum(axis=1, skipna=True) # transpose the dtm and get a sum of the overall term frequency dtm_trans = tef_dtm.T dtm_trans['total'] = dtm_trans.sum(axis=1, skipna=True) # create a data dictionary as per this tutorial https://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/master/notebooks/Movie%20Reviews%2C%20AP%20News%2C%20and%20Jeopardy.ipynb data = {'topic_term_dists':model.topic_word_, 'doc_topic_dists':model.doc_topic_, 'doc_lengths':doc_lengths, 'vocab':vocab, 'term_frequency':list(dtm_trans['total'])} # prepare the data tef_vis_data = pyLDAvis.prepare(**data) # this bit needs to be run after running the earlier code for reasons pyLDAvis.display(tef_vis_data) # save to HTML pyLDAvis.save_html(tef_vis_data, "LDAvis.html")
Hey thanks it would be great if you could add a few lines in order to make your code reproducible. Not sure what is vocab here and if for document-term matrix you mean the matrix that contains the frequencies, or simply the counts
Hello @famargar
Vocab would just be the vocabulary stored as a list, and is the same variable name from the main tutorial. Not sure what the difference is between word frequencies and word counts in this context. A DTM is standardly a matrix of N x M where N is each document as a row, M is a column for each word in the vocabulary, and each item is the per-document per-word frequency/count.
Hope that helps