preprocessor icon indicating copy to clipboard operation
preprocessor copied to clipboard

Fix support for non-English texts

Open omid-jf opened this issue 4 years ago • 1 comments

The encode('ascii', 'ignore').decode('ascii') strategy does not work for non-English characters. Since emoji regex patterns already exist in defines.py, regex substitute is sufficient to remove the emojis.

Fixes #47 and #48

omid-jf avatar Mar 29 '21 21:03 omid-jf

The pattern defined in defines.py does not contain newer emojis though and needs to be updated. emoji.get_emoji_regexp() from https://pypi.org/project/emoji can be used instead as well.

omid-jf avatar Mar 29 '21 22:03 omid-jf