textclean issues

Could "Any-Latin; Latin-ASCII" be added to replace_non_ascii() to address logographics/cyrillic/devanagari?

I see that `replace_non_ascii()` uses `stringi::stri_trans_general(x, "latin-ascii")` This doesn't seem to work for logographic, Cyrillic, or Devanagari characters: ```r library(stringi) x

dustinstoltz

replace_tag only works for non-capatizled text

replace_tag only works for handles like @johnnyexample not for @JohnnyExample. This is a problem as i do not want to lower case my complete text. Any solution?

apps4research

replace_word_elongations creates NAs

2

I was doing some text clean up with replace_word_elongations and ran into a case where the phrase "AAA battery" caused all word elongation matches to become NA. I modified the...

erhard1

replace_misspelling function is missing when library is loaded

1

I see the function in the R code on git but it's not available once the library is loaded. e.g, "?replace_missing" doesn't return anything even when hunspell is running. I'd...

jcalebsmith

Improve package quality

The following tasks need to be done for improving the package quality: - [ ] Add vignette - [ ] Add unit test for >= 85% of code - [...

trinker

I don’t know if it is possible to include, in your fantastic package, replacements for percentage numbers (i.e. +10% -> positive percentage increase, +20% very positive percentage increase, etc.) Ideally,...

snvv

grepl often faster with perl = T

5

I notice this library has quite a few grepl calls. Not sure if you're aware but using perl is often considerably faster ... example ```R > vec vec system.time(grepl("^\\s*$", vec))...

KyleHaynes

contraction "hadn't"

Hi, I am using `replace_contraction()` and find that it does not replace contraction when coping with hadn't. ``` > "hadn't" %>% + replace_contraction() [1] "hadn't" ```

xinzhuohkust

Fixes Lost braces

2

Hi! We are on the r-dev day working on https://github.com/r-devel/r-dev-day/issues/110. Your package affected by the "Lost braces" note (see https://cran.r-project.org/web/checks/check_results_textclean.html). This PR fixes the issue. Cheers!

eliocamp

replace_time changing ratios

The `replace_time` function will also currently change ratios. For example, ``` x = "We use a training-validation-test split of 60:20:20 for both datasets." replace_time(x) [1] "We use a training-validation-test split...

agbarnett

textclean
textclean copied to clipboard

Metadata

Could "Any-Latin; Latin-ASCII" be added to replace_non_ascii() to address logographics/cyrillic/devanagari?

replace_tag only works for non-capatizled text

replace_word_elongations creates NAs

replace_misspelling function is missing when library is loaded

Improve package quality

Feature request

grepl often faster with perl = T

contraction "hadn't"

Fixes Lost braces

replace_time changing ratios

← Metadata

Owner

Metadata

textclean textclean copied to clipboard

Metadata

← Metadata

Owner

Metadata

textclean
textclean copied to clipboard