dataReporter
dataReporter copied to clipboard
issue with utf-8 weird chars
Great package
Used development version. Data has a variable with variety of cities around globe, UTF-8. (Windows, RStudio), can't share, sry.
It runs through despite errors: with weird characters, e.g. html gives you “Wiener Neustädt”
Thanks again for submitting the issue.
I think I'll need a minimal example that produces the error in order to find a solution. I'm having trouble getting the error myself.
Does the following code produce the problem for you?
a <- data.frame(cities = c(rep("Copenhagen", 2), "Budapest", "Wiener Neustädt"),
num = c(1, 2, 1, 3))
library(dataReporter)
makeDataReport(a, file = "deleteme.rmd", output = "html", replace = TRUE)
And if so, would you mind sharing the output of devtools::session_info()
?
For future self: We discussed the issue further via email, seems like the problem was local. I was not able to reproduce it even with the original data.
I am getting the same issue. Your European city example gives the following error:
Data report generation is finished. Please wait while your output file is being rendered.
Error in sub(re, "", x, perl = TRUE) : input string 2 is invalid UTF-8
In addition: Warning messages:
1: In readLines(con, warn = FALSE) :
invalid input found on input connection 'deleteme.rmd'
2: In xfun::read_utf8(input) :
The file deleteme.rmd is not encoded in UTF-8. These lines contain invalid UTF-8 characters: 94, 102
deleteme.rmd shows Neust?dt
in place of Neustädt
if I open it in rstudio (defualts to opening Rmds as utf-8). If I choose to reopen it with ISO-8859-1 (system default encoding) Neustädt
displays correctly. This is a stupid windows problem I've run up against before.
Looking at the docs for rmarkdown::render()
, the rmarkdown document produced is always UTF-8 (there is an encoding argument, but it is actually ignored). However the default option for file()
is encoding = getOption("encoding")
which by default is "native.enc"
. Thus when the native encoding isn't UTF-8 it will save the UTF-8 document in the native encoding (eg. ISO-8859-1) leading to any fancy characters being mis-rendered. Because the output of rmarkdown::render() is always going to be UTF-8, the encoding set by file()
should be set to explicitly be "UTF-8". So in makeDataReport.R calls to file should be:
fileConn <- file(file, "w", encoding = "UTF-8") #for main document
vListConn <- file(vListFileName, "w", encoding = "UTF-8")
I haven't tested that the change actually works, but I think that it should do.
output of devtools::session_info():
- Session info -----------------------------------------------------------------------------------------------------------
setting value
version R version 4.1.1 (2021-08-10)
os Windows 10 x64 (build 15063)
system x86_64, mingw32
ui RStudio
language (EN)
collate English_Australia.1252
ctype English_Australia.1252
tz Australia/Sydney
date 2022-02-01
rstudio 2021.09.0+351 Ghost Orchid (desktop)
pandoc 2.14.0.3 @ C:/Users/XXX/scoop/apps/rstudio/current/bin/pandoc/ (via rmarkdown)
- Packages ---------------------------------------------------------------------------------------------------------------
! package * version date (UTC) lib source
askpass 1.1 2019-01-13 [1] CRAN (R 4.1.1)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.1)
backports 1.3.0 2021-10-27 [1] CRAN (R 4.1.1)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.1.1)
broom * 0.7.10 2021-10-31 [1] CRAN (R 4.1.1)
bslib 0.3.1 2021-10-06 [1] CRAN (R 4.1.2)
cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.1)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.1)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.1.1)
checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.1.1)
CHNOSZ 1.4.1 2021-04-09 [1] CRAN (R 4.1.2)
class 7.3-19 2021-05-03 [1] CRAN (R 4.1.1)
classInt 0.4-3 2020-04-07 [1] CRAN (R 4.1.1)
cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.1)
cluster 2.1.2 2021-04-17 [1] CRAN (R 4.1.1)
codetools 0.2-18 2020-11-04 [1] CRAN (R 4.1.1)
colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.1)
crayon * 1.4.2 2021-10-29 [1] CRAN (R 4.1.2)
crosstalk 1.1.1 2021-01-12 [1] CRAN (R 4.1.1)
curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.1)
cusumcharter 0.1.0 2021-11-15 [1] CRAN (R 4.1.2)
data.table 1.14.2 2021-09-27 [1] CRAN (R 4.1.1)
data.tree 1.0.0 2020-08-03 [1] CRAN (R 4.1.1)
dataReporter * 1.0.2 2021-11-11 [1] CRAN (R 4.1.2)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.1)
dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.1.1)
DEoptimR 1.0-10 2022-01-03 [1] CRAN (R 4.1.2)
desc 1.4.0 2021-09-28 [1] CRAN (R 4.1.2)
devtools 2.4.3 2021-11-30 [1] CRAN (R 4.1.2)
diffdf * 1.0.4 2020-03-17 [1] CRAN (R 4.1.1)
digest 0.6.28 2021-09-23 [1] CRAN (R 4.1.1)
dplyover * 0.0.8.9002 2021-11-01 [1] Github (TimTeaFan/dplyover@f0cd984)
dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.1.1)
DT 0.19 2021-09-02 [1] CRAN (R 4.1.1)
e1071 1.7-9 2021-09-16 [1] CRAN (R 4.1.1)
editData * 0.1.8 2021-04-02 [1] CRAN (R 4.1.1)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.1)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.1)
exifr * 0.3.2 2021-03-20 [1] CRAN (R 4.1.1)
fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.1)
farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.1)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.1)
flextable * 0.6.9 2021-10-07 [1] CRAN (R 4.1.1)
forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.1.1)
foreign 0.8-81 2020-12-22 [1] CRAN (R 4.1.1)
Formula 1.2-4 2020-10-16 [1] CRAN (R 4.1.1)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.1)
gdtools 0.2.3 2021-01-06 [1] CRAN (R 4.1.1)
generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.1)
ggforce * 0.3.3 2021-03-05 [1] CRAN (R 4.1.1)
ggh4x * 0.2.0 2021-08-21 [1] CRAN (R 4.1.1)
ggiraph * 0.7.10 2021-05-19 [1] CRAN (R 4.1.2)
ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.1)
ggrepel 0.9.1 2021-01-15 [1] CRAN (R 4.1.1)
glue * 1.4.2 2020-08-27 [1] CRAN (R 4.1.1)
GQAnalyzer * 0.1.0 2021-11-01 [1] Github (khaors/GQAnalyzer@d51540c)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.1.1)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.1)
hablar * 0.3.0 2020-03-19 [1] CRAN (R 4.1.1)
haven 2.4.3 2021-08-04 [1] CRAN (R 4.1.1)
here * 1.0.1 2020-12-13 [1] CRAN (R 4.1.1)
highr 0.9 2021-04-16 [1] CRAN (R 4.1.1)
Hmisc 4.6-0 2021-10-07 [1] CRAN (R 4.1.1)
hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.1)
htmlTable 2.3.0 2021-10-12 [1] CRAN (R 4.1.1)
htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1)
htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.1.1)
httpuv 1.6.3 2021-09-09 [1] CRAN (R 4.1.1)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.1)
janitor * 2.1.0 2021-01-05 [1] CRAN (R 4.1.1)
jpeg 0.1-9 2021-07-24 [1] CRAN (R 4.1.1)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.1)
jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.1)
KernSmooth 2.23-20 2021-05-03 [1] CRAN (R 4.1.1)
knitr 1.36 2021-09-29 [1] CRAN (R 4.1.1)
labeling 0.4.2 2020-10-20 [1] CRAN (R 4.1.1)
labelled * 2.9.0 2021-10-29 [1] CRAN (R 4.1.2)
later 1.3.0 2021-08-18 [1] CRAN (R 4.1.1)
lattice 0.20-44 2021-05-02 [1] CRAN (R 4.1.1)
latticeExtra 0.6-29 2019-12-19 [1] CRAN (R 4.1.1)
leafem 0.1.6 2021-05-24 [1] CRAN (R 4.1.1)
leaflet * 2.0.4.1 2021-01-07 [1] CRAN (R 4.1.1)
leafpm * 0.1.0 2019-03-13 [1] CRAN (R 4.1.1)
librarian 1.8.1 2021-07-12 [1] CRAN (R 4.1.1)
lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1)
lubridate * 1.8.0 2021-10-07 [1] CRAN (R 4.1.1)
magrittr * 2.0.1 2020-11-17 [1] CRAN (R 4.1.1)
mapedit * 0.6.0 2020-02-02 [1] CRAN (R 4.1.1)
mapview * 2.10.0 2021-06-05 [1] CRAN (R 4.1.1)
MASS 7.3-54 2021-05-03 [1] CRAN (R 4.1.1)
Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.1)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.1.2)
mgcv 1.8-36 2021-06-01 [1] CRAN (R 4.1.1)
mime 0.12 2021-09-28 [1] CRAN (R 4.1.1)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.1.1)
modelr 0.1.8 2020-05-19 [1] CRAN (R 4.1.1)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.1)
nlme 3.1-152 2021-02-04 [1] CRAN (R 4.1.1)
nnet 7.3-16 2021-05-03 [1] CRAN (R 4.1.1)
officer * 0.4.1 2021-11-14 [1] CRAN (R 4.1.2)
openxlsx * 4.2.4 2021-06-16 [1] CRAN (R 4.1.1)
pander * 0.6.4 2021-06-13 [1] CRAN (R 4.1.2)
pdftools 3.0.1 2021-05-06 [1] CRAN (R 4.1.1)
pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.1)
pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.1.2)
pkgcond * 0.1.1 2021-04-28 [1] CRAN (R 4.1.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.1)
pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.1.2)
plyr 1.8.6 2020-03-03 [1] CRAN (R 4.1.1)
png 0.1-7 2013-12-03 [1] CRAN (R 4.1.1)
polyclip 1.10-0 2019-03-14 [1] CRAN (R 4.1.1)
pracma 2.3.3 2021-01-23 [1] CRAN (R 4.1.1)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.1)
processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.1)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.1)
proxy 0.4-26 2021-06-07 [1] CRAN (R 4.1.1)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.1)
purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.1)
qpdf * 1.1 2019-03-07 [1] CRAN (R 4.1.1)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1)
rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.1.1)
raster 3.5-11 2021-12-23 [1] CRAN (R 4.1.2)
RColorBrewer * 1.1-2 2014-12-07 [1] CRAN (R 4.1.1)
Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.1)
readr * 2.0.2 2021-09-27 [1] CRAN (R 4.1.1)
readxl * 1.3.1 2019-03-13 [1] CRAN (R 4.1.1)
remotes 2.4.1 2021-09-29 [1] CRAN (R 4.1.1)
repr 1.1.3 2021-01-21 [1] CRAN (R 4.1.1)
reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.1)
rhandsontable * 0.3.8 2021-05-27 [1] CRAN (R 4.1.1)
rio 0.5.27 2021-06-21 [1] CRAN (R 4.1.1)
D rJava 1.0-5 2021-09-24 [1] CRAN (R 4.1.1)
rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.1)
rlist * 0.4.6.2 2021-09-03 [1] CRAN (R 4.1.1)
rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.1)
robustbase 0.93-9 2021-09-27 [1] CRAN (R 4.1.2)
rpart 4.1-15 2019-04-12 [1] CRAN (R 4.1.1)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.1)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.1)
rvest 1.0.2 2021-10-16 [1] CRAN (R 4.1.1)
sass 0.4.0 2021-05-12 [1] CRAN (R 4.1.1)
satellite 1.0.4 2021-10-12 [1] CRAN (R 4.1.1)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.1)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
sf * 1.0-3 2021-10-07 [1] CRAN (R 4.1.1)
shiny * 1.7.1 2021-10-02 [1] CRAN (R 4.1.1)
shinyjs * 2.0.0 2020-09-09 [1] CRAN (R 4.1.1)
shinyWidgets * 0.6.2 2021-09-17 [1] CRAN (R 4.1.1)
skimr * 2.1.3 2021-03-07 [1] CRAN (R 4.1.1)
snakecase 0.11.0 2019-05-25 [1] CRAN (R 4.1.1)
SOfun * 1.76 2021-11-01 [1] Github (mrdwab/SOfun@e41fa62)
sp 1.4-6 2021-11-14 [1] CRAN (R 4.1.2)
splitstackshape * 1.4.8 2019-04-21 [1] CRAN (R 4.1.1)
staplr * 3.1.1 2021-01-11 [1] CRAN (R 4.1.1)
stringdist * 0.9.8 2021-09-09 [1] CRAN (R 4.1.1)
stringi * 1.7.5 2021-10-04 [1] CRAN (R 4.1.1)
stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.1.1)
survival 3.2-11 2021-04-26 [1] CRAN (R 4.1.1)
systemfonts 1.0.3 2021-10-13 [1] CRAN (R 4.1.1)
terra 1.5-12 2022-01-13 [1] CRAN (R 4.1.1)
tesseract * 4.1.2 2021-09-18 [1] CRAN (R 4.1.1)
testthat 3.1.1 2021-12-03 [1] CRAN (R 4.1.2)
tibble * 3.1.5 2021-09-30 [1] CRAN (R 4.1.1)
tidyr * 1.1.4 2021-09-27 [1] CRAN (R 4.1.1)
tidyselect * 1.1.1 2021-04-30 [1] CRAN (R 4.1.1)
tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.1.1)
tweenr 1.0.2 2021-03-23 [1] CRAN (R 4.1.1)
tzdb 0.1.2 2021-07-20 [1] CRAN (R 4.1.1)
units * 0.7-2 2021-06-08 [1] CRAN (R 4.1.1)
usethis 2.1.5 2021-12-09 [1] CRAN (R 4.1.2)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.1)
uuid * 0.1-4 2020-02-26 [1] CRAN (R 4.1.1)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.1)
webchem * 1.1.1 2021-02-07 [1] CRAN (R 4.1.1)
webshot 0.5.2 2019-11-22 [1] CRAN (R 4.1.1)
whoami 1.3.0 2019-03-19 [1] CRAN (R 4.1.2)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.1)
xfun 0.27 2021-10-18 [1] CRAN (R 4.1.1)
xml2 1.3.2 2020-04-23 [1] CRAN (R 4.1.1)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.1)
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.1)
zeallot * 0.1.0 2018-01-28 [1] CRAN (R 4.1.1)
zip 2.2.0 2021-05-31 [1] CRAN (R 4.1.1)
zoo * 1.8-9 2021-03-09 [1] CRAN (R 4.1.1)
[1] C:/Users/XXX/scoop/apps/r/4.1.1/library
D -- DLL MD5 mismatch, broken installation.
In the meantime for anyone else encountering this issue until the fix is published, it can be resolved locally by setting:
options(encoding = "UTF-8")
Thank you for this thorough and excellent suggestion. We will definitely look into making these changes the next time we work on updates!