scattertext icon indicating copy to clipboard operation
scattertext copied to clipboard

empath visualisation doesn't work with non binary categories

Open swartchris8 opened this issue 7 years ago • 2 comments

Can't click on nodes in the empath visualisation to see the relevant text. Get the below error with diffrent property numbers when clicking on them and text is not rendered under the visualisation.

Browser error:

Billingpayment-Visualization.html:4484 Uncaught TypeError: Cannot read property '14' of undefined
    at searchInExtraFeatures (Billingpayment-Visualization.html:4484)
    at gatherTermContexts (Billingpayment-Visualization.html:4453)
    at SVGTextElement.<anonymous> (Billingpayment-Visualization.html:5027)
    at SVGTextElement.<anonymous> (d3.min.js:2)

Python code to generate visualisation:


import scattertext as st
from IPython.display import IFrame

convention_df = st.SampleCorpora.ConventionData2012.get_data()
convention_df["party"].iloc[3] = "liberal"
convention_df["party"].iloc[4] = "republican"
convention_df["party"].iloc[5] = "liberal"
convention_df["party"].iloc[6] = "republican"

empath_corpus = st.CorpusFromParsedDocuments(convention_df.iloc[:15],
                                             category_col="party",
                                             feats_from_spacy_doc=st.FeatsFromOnlyEmpath(),
                                             parsed_col="text").build()

html = st.produce_scattertext_explorer(empath_corpus,
    category = 'democrat',
    category_name = 'democrat',
    not_category_name = "Not democrat",
    width_in_pixels=1000,
    use_non_text_features=True,
    use_full_doc=True)

file_name = 'democrat.html'
open(file_name, 'wb').write(html.encode('utf-8'))
IFrame(src=file_name, width = 1200, height=700)

Your Environment

  • Operating System: Ubuntu
  • Python Version Used: 3.6
  • Scattertext Version Used: 0.0.2.25
  • Environment Information:
  • Browser used (if an HTML error): Chrome, Chromium tested

swartchris8 avatar May 01 '18 15:05 swartchris8

Seems like the issue isn't with the multiple categories just the empath visualisation following snippet with 2 categories still fails:

import scattertext as st
from IPython.display import IFrame

convention_df = st.SampleCorpora.ConventionData2012.get_data()
convention_df["party"].iloc[3] = "liberal"
convention_df["party"].iloc[4] = "republican"
convention_df["party"].iloc[5] = "liberal"
convention_df["party"].iloc[6] = "republican"
convention_df[convention_df["party"] != "democrat"]["party"] = "not democrat"

empath_corpus = st.CorpusFromParsedDocuments(convention_df[:14],
                                             category_col="party",
                                             feats_from_spacy_doc=st.FeatsFromOnlyEmpath(),
                                             parsed_col="text").build()

html = st.produce_scattertext_explorer(empath_corpus,
    category = 'democrat',
    category_name = 'democrat',
    not_category_name = "Not democrat",
    width_in_pixels=1000,
    use_non_text_features=True,
    use_full_doc=True)

file_name = 'democrat.html'
open(file_name, 'wb').write(html.encode('utf-8'))
IFrame(src=file_name, width = 1200, height=700)

swartchris8 avatar May 01 '18 15:05 swartchris8

Thanks for the bug report.

I just made some significant improvements to the topic modeling component in Scattertext. You can not only view documents that match an empath category, but if you add

topic_model_term_lists=st.FeatsFromOnlyEmpath().get_top_model_term_lists()

as a parameter to produce_scattertext_explorer, it will bold the terms associated with the empath category. Please see https://github.com/JasonKessler/scattertext#visualizing-topic-models for more information.

JasonKessler avatar May 04 '18 03:05 JasonKessler