RedditScore
RedditScore copied to clipboard
Package for performing Reddit-based text analysis
There used to be no problems when importing CrazyTokenizer in Colab until February, but now it throws an error of "Protocol not found". To reproduce errors, Type: !pip install git+https://github.com/crazyfrogspb/RedditScore.git...
With the CrazyTokenizer (excellent results, btw, thanks!) I am running into an issue with a maximum character length for SpaCY: "[E088] Text of length 3029371 exceeds maximum of 1000000." You...
## Steps ```py from redditscore.tokenizer import CrazyTokenizer tokenizer = CrazyTokenizer(hashtags='split') tokenizer.tokenize("#20yearsago") ``` ## Actual Result ```py ['2', '0', 'y', 'e', 'a', 'r', 's', 'a', 'g', 'o'] ``` ## Expected Result...
## Installation ```sh pip install git+https://github.com/crazyfrogspb/RedditScore.git ``` ## Steps to reproduce ```py from redditscore.tokenizer import CrazyTokenizer tokenizer = CrazyTokenizer(hashtags=False) text = "Let's #makeamericagreatagain#americafirst" print(tokenizer.tokenize(text)) ``` ## Expected output ```py ["let's",...