preprocessor Encoding issue with non-English text

Encoding issue with non-English text

Open omid-jf opened this issue 3 years ago • 5 comments

A non-English unicode string as input to preprocessor.clean with preprocessor.OPT.EMOJI option returns random meaningless characters. And this is happening only on version 0.6.0

The cause of this issue seems to be line 50 of preprocess.py

To reproduce: import preprocessor as p p.set_options(p.OPT.URL, p.OPT.EMOJI, p.OPT.SMILEY) print(p.clean("внесла предложение призвать всех избегать применять незаконные"))

Dec 09 '20 04:12 omid-jf

preprocessor preprocessor copied to clipboard

Encoding issue with non-English text

preprocessor
preprocessor copied to clipboard