preprocessor
preprocessor copied to clipboard
Elegant and Easy Tweet Preprocessing in Python
The library does not work for foreign languages. It erases everything when trying to remove emojis or URLs, etc. Any idea how to resolve this issue? **Describe the bug** A...
Hello, Thanks for this convenient library ! 😄 Wouldn't be nice desirable to add a regex that can also **clean up HTML special entities** such as "&" , ">", etc....
**Is your feature request related to a problem? Please describe.** Some opinions in twitter consist only in text hashtag, and if we parse them maybe we left some important information....
Regarding issue #30, this is my proposition. Notice that there is ZWNJ (`\u200c`) between `w` and `*`.
In some RTL languages including Persian, ZWNJ character is used a lot and in particular, it is being used in hastags. For example in "#نیمفاصله" which has ZWNJ between نیم...
Preprocessing punctuations is tricky because of having punctuation characters in mentions, hashtags and urls.