Chinese scattertext

Open sound118 opened this issue 5 years ago • 1 comments

Your Environment

Operating System:
Python Version Used:
Scattertext Version Used:
Environment Information:
Browser used (if an HTML error): Hi,

It seems in your demo code, developer can directly use "chinese_nlp" module from scattertext package. I am wondering for plotting Chinese scatter text, if we could add a list of user defined stopwords and probably some user-defined dictionary specific for certain Chinese context, then use jieba to do the word segmentation and tie all these cleaned results to your demo program?

Thanks

Apr 06 '20 08:04 sound118

You could stop list after tokenization by running corpus.remove_terms(...). Otherwise, feel free to modify AsianNLP.py to fit your use case. It just ducktypes spaCy’s interface.

Apr 06 '20 17:04 JasonKessler