TextBlob
TextBlob copied to clipboard
Tokenization incorrectly splits "gonna" into "gon" and "na"
Verified that this occurs in 0.10.0 :sob:
>>> import textblob
>>> textblob.TextBlob('gonna do this').words
WordList(['gon', 'na', 'do', 'this'])
@whosken this is the standard NLTK (TreeBank) tokenization. You might wanna use NLTK directly for other options.