preprocessor icon indicating copy to clipboard operation
preprocessor copied to clipboard

Encoding issue with non-English text

Open omid-jf opened this issue 3 years ago • 5 comments

A non-English unicode string as input to preprocessor.clean with preprocessor.OPT.EMOJI option returns random meaningless characters. And this is happening only on version 0.6.0

The cause of this issue seems to be line 50 of preprocess.py

To reproduce: import preprocessor as p p.set_options(p.OPT.URL, p.OPT.EMOJI, p.OPT.SMILEY) print(p.clean("внесла предложение призвать всех избегать применять незаконные"))

omid-jf avatar Dec 09 '20 04:12 omid-jf