textacy
textacy copied to clipboard
more, better, and interactive(?) data viz
textacy
currently has two visualizations: draw_semantic_network()
for visualizing documents as networks of terms with edges given by, say, term co-occurrence; and draw_termite_plot()
for visualizing the relationship between topics and terms in a topic model. Both of these could be improved!
There are also tons of other visualizations that textacy
users could benefit from:
- pyldavis for visualizing various aspects of topic models interactively
- word clouds to show word (or, generically, term) counts
- word trees to show word sequences
- parallel tag clouds to show differences in key terms over time or between groups
- stream graph for showing trends over time in, say, topic prevalence or word usage
- dependency parsing viz a la displacy
- compareclouds for visualizing media frames
I should stop listing these out and just point people to this site, which contains tons of possibilities.
implementation in textacy
- Python-only, without a bunch of extra dependencies (preferred)
- easy interoperability with relevant classes / functions
- what else...?
PyLDAVis is pretty simple w.r.t input. I came up with the following for the prepare method,
model = textacy.tm.TopicModel('lda', n_topics=30)
model.fit(doc_term_matrix)
doc_topic_matrix = model.transform(doc_term_matrix)
top_term_matrix = model.model.components_
doc_lengths = [len(d) for d in documents]
vocab = list(id2term.values())
term_frequency = textacy.vsm.get_term_freqs(doc_term_matrix)
import pyLDAvis
vis_data = pyLDAvis.prepare(
top_term_matrix,
doc_topic_matrix,
doc_lengths,
vocab,
term_frequency,
)
One thing, pyldavis does an assertion on the document topic matrix to ensure all rows sum to one. This happens for LDA, but I noticed that NMF didn't do this step, I don't know about LSA.
Hello @bdewilde - we've been working on a machine learning visualization library called Yellowbrick, to provide custom Matplotlib visualizers for Scikit-Learn estimators. The project is still young, but is growing, and we've recently added a few new features for visualization to support modeling on text. We're big fans of your work and we think the list of ideas in this issue is very interesting. Not sure if you're still interested in pursuing the text viz stuff or have moved on to other things, but let us know if you have any additional thoughts or suggestions!