vaderSentiment
vaderSentiment copied to clipboard
Dictionary contains phrases like "fed up" that will never hit because of how the sentence is tokenized
The dictionary contains phrases like "fed up" but since the code checks if words are in the dictionary on a word by word basis, these phrases never hit:
> from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
>>> analyzer=SentimentIntensityAnalyzer()
>>> analyzer.polarity_scores("I am fed up")
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
>>>
If I understand the code correctly, "fed up" (or any other multi-word phrases) should be removed from the lexicon.txt file and instead be added to the SENTIMENT_LADEN_IDIOMS, but the actual code for handling this seems to be a placeholder for a future addition.
I found a work-around for handling bigrams (2-word phrases) on Stack Overflow: https://stackoverflow.com/questions/67798527/nltk-vader-sentimentintensityanalyzer-bigram