scattertext
                                
                                 scattertext copied to clipboard
                                
                                    scattertext copied to clipboard
                            
                            
                            
                        empath visualisation doesn't work with non binary categories
Can't click on nodes in the empath visualisation to see the relevant text. Get the below error with diffrent property numbers when clicking on them and text is not rendered under the visualisation.
Browser error:
Billingpayment-Visualization.html:4484 Uncaught TypeError: Cannot read property '14' of undefined
    at searchInExtraFeatures (Billingpayment-Visualization.html:4484)
    at gatherTermContexts (Billingpayment-Visualization.html:4453)
    at SVGTextElement.<anonymous> (Billingpayment-Visualization.html:5027)
    at SVGTextElement.<anonymous> (d3.min.js:2)
Python code to generate visualisation:
import scattertext as st
from IPython.display import IFrame
convention_df = st.SampleCorpora.ConventionData2012.get_data()
convention_df["party"].iloc[3] = "liberal"
convention_df["party"].iloc[4] = "republican"
convention_df["party"].iloc[5] = "liberal"
convention_df["party"].iloc[6] = "republican"
empath_corpus = st.CorpusFromParsedDocuments(convention_df.iloc[:15],
                                             category_col="party",
                                             feats_from_spacy_doc=st.FeatsFromOnlyEmpath(),
                                             parsed_col="text").build()
html = st.produce_scattertext_explorer(empath_corpus,
    category = 'democrat',
    category_name = 'democrat',
    not_category_name = "Not democrat",
    width_in_pixels=1000,
    use_non_text_features=True,
    use_full_doc=True)
file_name = 'democrat.html'
open(file_name, 'wb').write(html.encode('utf-8'))
IFrame(src=file_name, width = 1200, height=700)
Your Environment
- Operating System: Ubuntu
- Python Version Used: 3.6
- Scattertext Version Used: 0.0.2.25
- Environment Information:
- Browser used (if an HTML error): Chrome, Chromium tested
Seems like the issue isn't with the multiple categories just the empath visualisation following snippet with 2 categories still fails:
import scattertext as st
from IPython.display import IFrame
convention_df = st.SampleCorpora.ConventionData2012.get_data()
convention_df["party"].iloc[3] = "liberal"
convention_df["party"].iloc[4] = "republican"
convention_df["party"].iloc[5] = "liberal"
convention_df["party"].iloc[6] = "republican"
convention_df[convention_df["party"] != "democrat"]["party"] = "not democrat"
empath_corpus = st.CorpusFromParsedDocuments(convention_df[:14],
                                             category_col="party",
                                             feats_from_spacy_doc=st.FeatsFromOnlyEmpath(),
                                             parsed_col="text").build()
html = st.produce_scattertext_explorer(empath_corpus,
    category = 'democrat',
    category_name = 'democrat',
    not_category_name = "Not democrat",
    width_in_pixels=1000,
    use_non_text_features=True,
    use_full_doc=True)
file_name = 'democrat.html'
open(file_name, 'wb').write(html.encode('utf-8'))
IFrame(src=file_name, width = 1200, height=700)
Thanks for the bug report.
I just made some significant improvements to the topic modeling component in Scattertext. You can not only view documents that match an empath category, but if you add
topic_model_term_lists=st.FeatsFromOnlyEmpath().get_top_model_term_lists()
as a parameter to produce_scattertext_explorer, it will bold the terms associated with the empath category.  Please see https://github.com/JasonKessler/scattertext#visualizing-topic-models for more information.