NRCLex icon indicating copy to clipboard operation
NRCLex copied to clipboard

Whole new features

Open stormbeforesunsetbee opened this issue 2 years ago • 0 comments

What's new:

  • Read from an NRC lexicon txt, any formats, any languages
  • Word by word lemmatization isn't enough, we need to expand the loaded lexicon instead
  • Less dependencies
  • Add badge
  • Negation support. Fix #4

Example

Text: she's always arguing!

New:

{'fear': 0.0,
 'anger': 0.3333333333333333,
 'anticipation': 0.0,
 'trust': 0.3333333333333333,
 'surprise': 0.0,
 'positive': 0.0,
 'negative': 0.3333333333333333,
 'sadness': 0.0,
 'disgust': 0.0,
 'joy': 0.0}

Previous:

{'fear': 0.0,
 'anger': 0.0,
 'anticip': 0.0,
 'trust': 0.0,
 'surprise': 0.0,
 'positive': 0.0,
 'negative': 0.0,
 'sadness': 0.0,
 'disgust': 0.0,
 'joy': 0.0}

To see all of the possible expansions

# use the old version

def expand_synonyms(lex):
    from nltk.corpus import wordnet

    lex_ = {}

    for i in lex:
        for j in wordnet.synsets(i):
            for k in j.lemmas():
                word = k.name().replace('_', ' ')

                if not word in lex:
                    lex_[word] = lex[i]

    return lex_

expand_synonyms(nrc.lexicon)

stormbeforesunsetbee avatar Dec 08 '22 16:12 stormbeforesunsetbee