sentimentr
sentimentr copied to clipboard
Possible cleaning function for reimport
This would belong in textclean but things that are abbreviated forms like fan vs fanatic:
> sentiment(c("He's a nice guy", "can be a jerk. I'm not a fan."))
element_id sentence_id word_count sentiment
1: 1 1 4 0.25
2: 2 1 4 -0.25
3: 2 2 4 0.00
> sentiment(c("He's a nice guy", "can be a jerk. I'm not a fanatic."))
element_id sentence_id word_count sentiment
1: 1 1 4 0.25
2: 2 1 4 -0.25
3: 2 2 4 0.25
could be replaced:
WIP
fix_fan <- function(x, ...){
gsub(paste0(pro_replacements, '(\\b[Ff]an)(\\b|s?)'), '\\1\\2atic\\3', x, perl = TRUE, ignore.case = TRUE)
}
pronouns <- c("s?he( i|')s", "(you|they|we)( a|')re", "I( a|')m")
pro_replacements <- paste0('(', paste(paste0('(', pronouns, ')'), collapse = '|'), ')')
fix_fan('He\'s the bigest fan I know.')
Would be in textclean but rexported by sentimentr
inputs <- c(
"He's the bigest fan I know.",
"I am a huge fan of his.",
"I know she has lots of fans in his club",
"I was cold and turned on the fan",
"An air conditioner is better than 2 fans at cooling.",
"I'm a really gigantic and humble fan of the book."
)
fix_fan <- function(x, pronoun.distance = 20, ...){
gsub(
paste0("((?:s?he(?: i| ha|')s|(?:you|they|we)(?: a|')re|I(?: a|')m).{1,", pronoun.distance, "})\\b(fan)(s?)\\b"),
'\\1\\2atic\\3',
x,
ignore.case = TRUE
)
}
fix_fan2 <- function(x, pronoun.distance = 20, ...){
stringi::stri_replace_all_regex(
x,
paste0("((?:s?he(?: i| ha|')s|(?:you|they|we)(?: a|')re|I(?: a|')m).{1,", pronoun.distance, "})\\b(fan)(s?)\\b"),
'$1$2atic$3',
opts_regex = stringi::stri_opts_regex(case_insensitive=TRUE)
)
}
fix_fan(inputs)
fix_fan(inputs, 30)
fix_fan2(inputs)
Other examples include:
tibble::tribble(
~short, ~long,
"fan", "fanatic",
"emo", "emotionally disturbed"
)
Note these are called shortenings:
https://en.oxforddictionaries.com/spelling/shortenings
and more formally: https://en.wikipedia.org/wiki/Clipping_(morphology)
y <- c('tazer', 'emo', 'typo', 'quake', 'scram')
lexicon::hash_sentiment_jockers_rinker[y]
Consider adding polarity table directly