textclean
textclean copied to clipboard
Tools for cleaning and normalizing text data
I see that `replace_non_ascii()` uses `stringi::stri_trans_general(x, "latin-ascii")` This doesn't seem to work for logographic, Cyrillic, or Devanagari characters: ```r library(stringi) x
replace_tag only works for handles like @johnnyexample not for @JohnnyExample. This is a problem as i do not want to lower case my complete text. Any solution?
I was doing some text clean up with replace_word_elongations and ran into a case where the phrase "AAA battery" caused all word elongation matches to become NA. I modified the...
I see the function in the R code on git but it's not available once the library is loaded. e.g, "?replace_missing" doesn't return anything even when hunspell is running. I'd...
The following tasks need to be done for improving the package quality: - [ ] Add vignette - [ ] Add unit test for >= 85% of code - [...
I don’t know if it is possible to include, in your fantastic package, replacements for percentage numbers (i.e. +10% -> positive percentage increase, +20% very positive percentage increase, etc.) Ideally,...
I notice this library has quite a few grepl calls. Not sure if you're aware but using perl is often considerably faster ... example ```R > vec vec system.time(grepl("^\\s*$", vec))...
Hi, I am using `replace_contraction()` and find that it does not replace contraction when coping with hadn't. ``` > "hadn't" %>% + replace_contraction() [1] "hadn't" ```
Hi! We are on the r-dev day working on https://github.com/r-devel/r-dev-day/issues/110. Your package affected by the "Lost braces" note (see https://cran.r-project.org/web/checks/check_results_textclean.html). This PR fixes the issue. Cheers!
The `replace_time` function will also currently change ratios. For example, ``` x = "We use a training-validation-test split of 60:20:20 for both datasets." replace_time(x) [1] "We use a training-validation-test split...