textrecipes icon indicating copy to clipboard operation
textrecipes copied to clipboard

add step_label_emoji

Open EmilHvitfeldt opened this issue 3 years ago • 0 comments

This step will take a string/tokenlist and replace any emoji with a natural language label, that can then be used in downstream steps easier.

Should have a pre and post argument used to paste characters around the labels.

library(emoji)
library(tokenizers)
library(textrecipes)

token_x <- tokenize_words(emoji_samples$text, strip_punct = FALSE)

emoji_swap <- function(x, emoji, label, pre = "_", post = "_") {
  emoji_ind <- emoji_detect(x)

  x[emoji_ind] <- paste0(pre, label[match(x[emoji_ind], emoji)], post)
 x
}

emoji_swap(token_x[[1]], emoji_name, names(emoji_name))
#> [1] "my"            "_alarm_clock_" "didn’t"        "work"         
#> [5] "."
emoji_swap(token_x[[1]], emojis$emoji, emojis$group)
#> [1] "my"                "_Travel & Places_" "didn’t"           
#> [4] "work"              "."
emoji_swap(token_x[[1]], emojis$emoji, emojis$subgroup)
#> [1] "my"     "_time_" "didn’t" "work"   "."

Created on 2021-07-29 by the reprex package (v2.0.0)

EmilHvitfeldt avatar Jul 30 '21 06:07 EmilHvitfeldt