checkhelper icon indicating copy to clipboard operation
checkhelper copied to clipboard

As a dev, I want to add a way to deal with \uxxxx issue

Open VincentGuyader opened this issue 3 years ago • 5 comments

detect and explicit the correction to do.

VincentGuyader avatar Jun 28 '21 12:06 VincentGuyader

Maybe this issue is for {checkhelper} finally: https://github.com/ThinkR-open/thinkr/issues/13

statnmap avatar Jun 28 '21 12:06 statnmap

Find a proper way to add this as a function.
Set a parameter to define if this should be transformed as hex so that there are accents in the documentation and functions, or to letters without accents so that it is readable in the code directly.
Usually if in #' this should hex and in simple comments, transform without accent.

Maybe stringi::stri_trans_general(char, "latin") can help detect special characters by comparing before / after its use.

#' Clean non-ASCII character
#' TODO Add to {thinkr}
#'
#' Add an option to transform as ? if they want
chars <- c(
  "à", "â",
  "é", "è", "ê",
  "î", "ï",
  "ô", "ö", "ø",
  "æ", "œ",
  "ù",
  "ç",
  "’", "²"
)


tempfile1 <- tempfile(fileext = ".txt")
file.copy(system.file("test_files/test_file.txt", package = "thinkr"), tempfile1)

clean_ascii_dir <- function(path, pattern = ".") {

  list.files(path, full.names = TRUE, pattern = pattern) %>%
    purrr::walk(clean_ascii_file)

}

# clean_ascii_file(tempfile1)

clean_ascii_file <- function(path) {

  path <- tempfile1
  # path <- paths[23]
  lines <- readr::read_lines(path)

  # Test if non-ascii characters
  asc <- iconv(lines, "latin1", "ASCII")
  ind_rox <- which((is.na(asc) | asc != lines) & grepl("^#'", lines))
  ind_no_rox <- which((is.na(asc) | asc != lines) & !grepl("^#'", lines))

  if (length(ind_rox) != 0) {

    for (char in chars) {
      lines[ind_rox] <- stringi::stri_replace_all_coll(
        lines[ind_rox],
        char,
        # paste0("\\", stringi::stri_trans_general(char, "hex"))
        paste0("\\", stringi::stri_trans_general(char, "Latin-ASCII"))
      )
    }

  }
  if  (length(ind_no_rox) != 0) {

    for (char in chars) {
      lines[ind_no_rox] <- stringi::stri_replace_all_coll(
        lines[ind_no_rox],
        char,
        stringi::stri_trans_general(char, "hex")
      )
    }
  }

  if (length(c(ind_rox, ind_no_rox)) != 0) {
    readr::write_lines(lines, path)
  }

  asc <- iconv(lines, "latin1", "ASCII")
  ind_rox <- which((is.na(asc) | asc != lines) & grepl("^#'", lines))
  ind_no_rox <- which((is.na(asc) | asc != lines) & !grepl("^#'", lines))

  if (length(ind_rox) != 0 | length(ind_no_rox) != 0) {
    warning("Some character of file '", path, "' have not been converted in lines:", paste(ind_rox, ind_no_rox))
  }

  cat(crayon::green(path, "should be clean"))

}

statnmap avatar Jun 28 '21 12:06 statnmap

With test files

#' Random file with non-ascii characters
#' Des caratères spéciaux aussi dans le roxygen

Ce texte peut-être considéré comme un texte qui ne passe pas les tests du CRAN.
En il contient des caractères de type non-ascii avec des accents tels que :

- "à", "â"
- "é", "è", "ê"
- "î", "ï"
- "ô", "ö", "ø"
#' "à", "â" # for roxygen, it is different
#' "é", "è", "ê" # for roxygen, it is different

And

#' A second random file with non-ascii characters

Ce texte peut-être considéré comme un texte qui ne passe pas les tests du CRAN.
En il contient des caractères de type non-ascii avec des accents tels que :

- "æ", "œ"
- "ù"
- "ç"
- "’", "²"

statnmap avatar Jul 13 '21 12:07 statnmap

And in the text that is in classical R comments, we can transform text without characters with stringi::stri_trans_general(char, "latin")

So that

  • for roxygen2 comments double escape for Latex : é => \\u00E9
  • for classical R comments use trans general : é => e
  • for character in character vectors in R code simple escape : é => \u00E9

statnmap avatar Oct 29 '21 06:10 statnmap

see also stringi::stri_escape_unicode(

VincentGuyader avatar Nov 16 '21 13:11 VincentGuyader