scattertext icon indicating copy to clipboard operation
scattertext copied to clipboard

Chinese scattertext

Open sound118 opened this issue 5 years ago • 1 comments

Your Environment

  • Operating System:
  • Python Version Used:
  • Scattertext Version Used:
  • Environment Information:
  • Browser used (if an HTML error): Hi,

It seems in your demo code, developer can directly use "chinese_nlp" module from scattertext package. I am wondering for plotting Chinese scatter text, if we could add a list of user defined stopwords and probably some user-defined dictionary specific for certain Chinese context, then use jieba to do the word segmentation and tie all these cleaned results to your demo program?

Thanks

sound118 avatar Apr 06 '20 08:04 sound118

You could stop list after tokenization by running corpus.remove_terms(...). Otherwise, feel free to modify AsianNLP.py to fit your use case. It just ducktypes spaCy’s interface.

JasonKessler avatar Apr 06 '20 17:04 JasonKessler