checkhelper As a dev, I want to add a way to deal with \uxxxx issue

detect and explicit the correction to do.

Jun 28 '21 12:06 VincentGuyader

Maybe this issue is for {checkhelper} finally: https://github.com/ThinkR-open/thinkr/issues/13

Jun 28 '21 12:06 statnmap

Find a proper way to add this as a function.
Set a parameter to define if this should be transformed as hex so that there are accents in the documentation and functions, or to letters without accents so that it is readable in the code directly.
Usually if in #' this should hex and in simple comments, transform without accent.

Maybe stringi::stri_trans_general(char, "latin") can help detect special characters by comparing before / after its use.

#' Clean non-ASCII character
#' TODO Add to {thinkr}
#'
#' Add an option to transform as ? if they want
chars <- c(
  "à", "â",
  "é", "è", "ê",
  "î", "ï",
  "ô", "ö", "ø",
  "æ", "œ",
  "ù",
  "ç",
  "’", "²"
)


tempfile1 <- tempfile(fileext = ".txt")
file.copy(system.file("test_files/test_file.txt", package = "thinkr"), tempfile1)

clean_ascii_dir <- function(path, pattern = ".") {

  list.files(path, full.names = TRUE, pattern = pattern) %>%
    purrr::walk(clean_ascii_file)

}

# clean_ascii_file(tempfile1)

clean_ascii_file <- function(path) {

  path <- tempfile1
  # path <- paths[23]
  lines <- readr::read_lines(path)

  # Test if non-ascii characters
  asc <- iconv(lines, "latin1", "ASCII")
  ind_rox <- which((is.na(asc) | asc != lines) & grepl("^#'", lines))
  ind_no_rox <- which((is.na(asc) | asc != lines) & !grepl("^#'", lines))

  if (length(ind_rox) != 0) {

    for (char in chars) {
      lines[ind_rox] <- stringi::stri_replace_all_coll(
        lines[ind_rox],
        char,
        # paste0("\\", stringi::stri_trans_general(char, "hex"))
        paste0("\\", stringi::stri_trans_general(char, "Latin-ASCII"))
      )
    }

  }
  if  (length(ind_no_rox) != 0) {

    for (char in chars) {
      lines[ind_no_rox] <- stringi::stri_replace_all_coll(
        lines[ind_no_rox],
        char,
        stringi::stri_trans_general(char, "hex")
      )
    }
  }

  if (length(c(ind_rox, ind_no_rox)) != 0) {
    readr::write_lines(lines, path)
  }

  asc <- iconv(lines, "latin1", "ASCII")
  ind_rox <- which((is.na(asc) | asc != lines) & grepl("^#'", lines))
  ind_no_rox <- which((is.na(asc) | asc != lines) & !grepl("^#'", lines))

  if (length(ind_rox) != 0 | length(ind_no_rox) != 0) {
    warning("Some character of file '", path, "' have not been converted in lines:", paste(ind_rox, ind_no_rox))
  }

  cat(crayon::green(path, "should be clean"))

}

Jun 28 '21 12:06 statnmap

With test files

#' Random file with non-ascii characters
#' Des caratères spéciaux aussi dans le roxygen

Ce texte peut-être considéré comme un texte qui ne passe pas les tests du CRAN.
En il contient des caractères de type non-ascii avec des accents tels que :

- "à", "â"
- "é", "è", "ê"
- "î", "ï"
- "ô", "ö", "ø"
#' "à", "â" # for roxygen, it is different
#' "é", "è", "ê" # for roxygen, it is different

And

#' A second random file with non-ascii characters

Ce texte peut-être considéré comme un texte qui ne passe pas les tests du CRAN.
En il contient des caractères de type non-ascii avec des accents tels que :

- "æ", "œ"
- "ù"
- "ç"
- "’", "²"

Jul 13 '21 12:07 statnmap

And in the text that is in classical R comments, we can transform text without characters with stringi::stri_trans_general(char, "latin")

So that

for roxygen2 comments double escape for Latex : é => \\u00E9
for classical R comments use trans general : é => e
for character in character vectors in R code simple escape : é => \u00E9

Oct 29 '21 06:10 statnmap

see also stringi::stri_escape_unicode(

Nov 16 '21 13:11 VincentGuyader

checkhelper checkhelper copied to clipboard

As a dev, I want to add a way to deal with \uxxxx issue

checkhelper
checkhelper copied to clipboard