preprocessor
preprocessor copied to clipboard
Encoding issue with non-English text
A non-English unicode string as input to preprocessor.clean with preprocessor.OPT.EMOJI option returns random meaningless characters. And this is happening only on version 0.6.0
The cause of this issue seems to be line 50 of preprocess.py
To reproduce: import preprocessor as p p.set_options(p.OPT.URL, p.OPT.EMOJI, p.OPT.SMILEY) print(p.clean("внесла предложение призвать всех избегать применять незаконные"))