tidytext
tidytext copied to clipboard
enabling existing international nrc lexicon in get_sentiments()
Hi, I love learning tidytext but was a bit surprised to see that the get_sentiments() function does not allow to use the non-english translations included within the Nov 2017 nrc lexicon v.092 xlsx file used by tidytext(english words are in column A, and are translated in dozens of languages from columns B to DA while DB to DK list the polarity and sentiment scores for each word). It would be amazing to add an argument to define which language (column) to use from the nrc lexicon i.e lang="French". Thanks, Leonard
The NRC-Emotion-Lexicon.zip
file that is currently downloaded via the function in the textdata package does include that .xlsx
file you are mentioning. Using these translations is within the permission we have from the lexicon creators, although of course translated sentiment lexicons can be less reliable.
@EmilHvitfeldt do you want to consider this in textdata?
I'm on it!
Thank you for your answers, great to know using the translations is within the permissions from the lexicon creators. I concur that using translated lexicons is less reliable than a natively created one. However, (i) for analyses comparing corpora spanning across different languages a single lexicon would be more reliable than a patchwork of different lexicons (ii) many languages, spoken by millions of people still lack reliable native lexicons. Thanks