preprocessor icon indicating copy to clipboard operation
preprocessor copied to clipboard

Elegant and Easy Tweet Preprocessing in Python

Results 16 preprocessor issues
Sort by recently updated
recently updated
newest added

The library does not work for foreign languages. It erases everything when trying to remove emojis or URLs, etc. Any idea how to resolve this issue? **Describe the bug** A...

bug

Hello, Thanks for this convenient library ! 😄 Wouldn't be nice desirable to add a regex that can also **clean up HTML special entities** such as "&" , ">", etc....

enhancement

**Is your feature request related to a problem? Please describe.** Some opinions in twitter consist only in text hashtag, and if we parse them maybe we left some important information....

enhancement

Regarding issue #30, this is my proposition. Notice that there is ZWNJ (`\u200c`) between `w` and `*`.

In some RTL languages including Persian, ZWNJ character is used a lot and in particular, it is being used in hastags. For example in "#نیم‌فاصله" which has ZWNJ between نیم...

enhancement

Preprocessing punctuations is tricky because of having punctuation characters in mentions, hashtags and urls.

enhancement
help wanted