[Feature Proposal] Use hashformers for hashtag segmentation
Is your feature request related to a problem? Please describe.
There is no alternative in Preprocessor to replacing hashtags by a dummy $HASHTAG$ token. This has frustrated some users who would rather like hashtags to be segmented, as evidenced by PR #43 .
Describe the solution you'd like
I propose to integrate hashformers with Preprocessor.
This would introduce hashformers as an optional dependency to the Preprocessor. Hashformers would be available as an extension through pip install tweet-preprocessor[hashformers] or pip install tweet-preprocessor[all].
Hashformers can segment hashtags in any language, and it is the current state-of-the-art in hashtag segmentation.
Describe alternatives you've considered
Hashformers has been proven by two research groups to be the current state-of-the-art for hashtag segmentation.
Additional context
If this seems like a good idea to the maintainers of this repository ( @s ), I can draft an initial PR for this feature.