testthat
testthat copied to clipboard
UTF-8 encoding of test files
Take the following two files:
some_test.r
file:
testthat::test_that('UTF-8 encoding test', {
expected <- '€²-symbols'
cat('\n')
print(utf8word)
print(expected)
testthat::expect_equal(object = utf8word, expected = expected)
})
test.r
file:
options(encoding = "UTF-8")
utf8word <- "€²-symbols"
testthat::test_file("some_test.r")
Note that ² is encoded as 2 bytes, and € as 3 bytes in UTF-8. I verified that this is indeed the case.
Executing test.r
results in the following:
== Testing some_test.r ======================================================
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]
[1] "€²-symbols"
[1] "\u0080�-symbols"
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]
Executing source('some_test.r')
thereafter returns:
[1] "€²-symbols"
[1] "€²-symbols"
Test passed
Hence it seems testthat_3.1.0
doesn't read the testcases correctly. (Test cases are assumed to be in UTF-8 since v3?) Is this a real issue, or am I doing something wrong?
Environment
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_Belgium.1252 LC_CTYPE=English_Belgium.1252 LC_MONETARY=English_Belgium.1252 LC_NUMERIC=C
[5] LC_TIME=English_Belgium.1252
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] testthat_3.1.0
loaded via a namespace (and not attached):
[1] compiler_3.6.3 magrittr_2.0.1 R6_2.5.1 cli_3.1.0 rprojroot_2.0.2 tools_3.6.3 withr_2.4.2 rstudioapi_0.13 crayon_1.4.2 desc_1.4.0
[11] pkgload_1.2.3 rlang_0.4.12 renv_0.13.2
Does this simpler reprex illustrate the same problem?
library(testthat)
test_that('UTF-8 encoding test', {
expect_equal("€²", "\u20ac\u00b2")
})
(I think it should since using unicode escapes should force the string to be encoded as UTF-8)
If it doesn't, could you please show me Encoding(expected)
and Encoding(utf8word)
?
And please try with the dev version of testthat, which has been overhauled to use brio everywhere. This has resolved a couple of other encoding related issues, so might solve yours too.