data-science-in-education
data-science-in-education copied to clipboard
NRC lexicon unavailable
Hi, in Chapter 11, it says to use the NRC lexicon for sentiment analysis. However using get_sentiments("nrc") returns an error when I select "1" from the little install menu that comes up:
Error: 'C:/Users/[...]/AppData/Local/textdata/textdata/Cache/nrc/NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-v0.92/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt' does not exist.
Apparently this has been an issue for a couple of years now. There are workarounds but perhaps this needs to be fixed in the book, unless I am missing something simple!
Hi @RobertTalbert , thanks for your message! Yes, perhaps something has changed since we published the book in 2020. Other folks have reached out and we successfully were able to install the file following this StackOverflow thread. On your computer, run the below:
library(tidyverse)
library(tidytext)
library(textdata)
library(readr)
library(utils)
# check the error
get_sentiments("nrc") # select 1: will throw error but data still has been downloaded
# where is the file, then?
textdata::lexicon_nrc(return_path = T) # it's here
folder_path <- "~/Library/Caches/textdata/nrc"
# the problem is that the default path is wrong, so we have to adjust it
system(paste0("mkdir ", file.path(folder_path, "NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-v0.92")))
system(paste0("cp ", file.path(folder_path, "NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt"), " ", file.path(folder_path, "NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-v0.92/")))
# now we have to process the nrc data using a slightly modified version of the subfunction detailed in the original function from the textdata-package: https://github.com/EmilHvitfeldt/textdata/blob/main/R/lexicon_nrc.R
name_path <- file.path(folder_path, "NRCWordEmotion.rds")
# slightly modified version:
process_nrc <- function(folder_path, name_path) {
data <- read_tsv(file.path(
folder_path,
"NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-v0.92/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt"
),
col_names = FALSE, col_types = cols(
X1 = col_character(),
X2 = col_character(),
X3 = col_double()
)
)
data <- data[data$X3 == 1, ]
data <- tibble(
word = data$X1,
sentiment = data$X2
)
write_rds(data, name_path)
}
Hope this works for you as well! Let us know!