checkhelper
checkhelper copied to clipboard
As a dev, I want to add a way to deal with \uxxxx issue
detect and explicit the correction to do.
Maybe this issue is for {checkhelper} finally: https://github.com/ThinkR-open/thinkr/issues/13
Find a proper way to add this as a function.
Set a parameter to define if this should be transformed as hex
so that there are accents in the documentation and functions, or to letters without accents so that it is readable in the code directly.
Usually if in #'
this should hex
and in simple comments, transform without accent.
Maybe stringi::stri_trans_general(char, "latin")
can help detect special characters by comparing before / after its use.
#' Clean non-ASCII character
#' TODO Add to {thinkr}
#'
#' Add an option to transform as ? if they want
chars <- c(
"à", "â",
"é", "è", "ê",
"î", "ï",
"ô", "ö", "ø",
"æ", "œ",
"ù",
"ç",
"’", "²"
)
tempfile1 <- tempfile(fileext = ".txt")
file.copy(system.file("test_files/test_file.txt", package = "thinkr"), tempfile1)
clean_ascii_dir <- function(path, pattern = ".") {
list.files(path, full.names = TRUE, pattern = pattern) %>%
purrr::walk(clean_ascii_file)
}
# clean_ascii_file(tempfile1)
clean_ascii_file <- function(path) {
path <- tempfile1
# path <- paths[23]
lines <- readr::read_lines(path)
# Test if non-ascii characters
asc <- iconv(lines, "latin1", "ASCII")
ind_rox <- which((is.na(asc) | asc != lines) & grepl("^#'", lines))
ind_no_rox <- which((is.na(asc) | asc != lines) & !grepl("^#'", lines))
if (length(ind_rox) != 0) {
for (char in chars) {
lines[ind_rox] <- stringi::stri_replace_all_coll(
lines[ind_rox],
char,
# paste0("\\", stringi::stri_trans_general(char, "hex"))
paste0("\\", stringi::stri_trans_general(char, "Latin-ASCII"))
)
}
}
if (length(ind_no_rox) != 0) {
for (char in chars) {
lines[ind_no_rox] <- stringi::stri_replace_all_coll(
lines[ind_no_rox],
char,
stringi::stri_trans_general(char, "hex")
)
}
}
if (length(c(ind_rox, ind_no_rox)) != 0) {
readr::write_lines(lines, path)
}
asc <- iconv(lines, "latin1", "ASCII")
ind_rox <- which((is.na(asc) | asc != lines) & grepl("^#'", lines))
ind_no_rox <- which((is.na(asc) | asc != lines) & !grepl("^#'", lines))
if (length(ind_rox) != 0 | length(ind_no_rox) != 0) {
warning("Some character of file '", path, "' have not been converted in lines:", paste(ind_rox, ind_no_rox))
}
cat(crayon::green(path, "should be clean"))
}
With test files
#' Random file with non-ascii characters
#' Des caratères spéciaux aussi dans le roxygen
Ce texte peut-être considéré comme un texte qui ne passe pas les tests du CRAN.
En il contient des caractères de type non-ascii avec des accents tels que :
- "à", "â"
- "é", "è", "ê"
- "î", "ï"
- "ô", "ö", "ø"
#' "à", "â" # for roxygen, it is different
#' "é", "è", "ê" # for roxygen, it is different
And
#' A second random file with non-ascii characters
Ce texte peut-être considéré comme un texte qui ne passe pas les tests du CRAN.
En il contient des caractères de type non-ascii avec des accents tels que :
- "æ", "œ"
- "ù"
- "ç"
- "’", "²"
And in the text that is in classical R comments, we can transform text without characters with stringi::stri_trans_general(char, "latin")
So that
- for roxygen2 comments double escape for Latex :
é
=>\\u00E9
- for classical R comments use trans general :
é
=>e
- for character in character vectors in R code simple escape :
é
=>\u00E9
see also stringi::stri_escape_unicode(