testthat icon indicating copy to clipboard operation
testthat copied to clipboard

UTF-8 encoding of test files

Open DavorJ opened this issue 3 years ago • 2 comments

Take the following two files:

some_test.r file:

testthat::test_that('UTF-8 encoding test', {
  expected <- '€²-symbols'
  cat('\n')
  print(utf8word)
  print(expected)
  testthat::expect_equal(object = utf8word, expected = expected)
})

test.r file:

options(encoding = "UTF-8")
utf8word <- "€²-symbols"
testthat::test_file("some_test.r")

Note that ² is encoded as 2 bytes, and € as 3 bytes in UTF-8. I verified that this is indeed the case.

Executing test.r results in the following:

== Testing some_test.r ======================================================
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]
[1] "€²-symbols"
[1] "\u0080�-symbols"
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]

Executing source('some_test.r') thereafter returns:

[1] "€²-symbols"
[1] "€²-symbols"
Test passed 

Hence it seems testthat_3.1.0 doesn't read the testcases correctly. (Test cases are assumed to be in UTF-8 since v3?) Is this a real issue, or am I doing something wrong?

Environment

R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_Belgium.1252  LC_CTYPE=English_Belgium.1252    LC_MONETARY=English_Belgium.1252 LC_NUMERIC=C                    
[5] LC_TIME=English_Belgium.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] testthat_3.1.0

loaded via a namespace (and not attached):
 [1] compiler_3.6.3  magrittr_2.0.1  R6_2.5.1        cli_3.1.0       rprojroot_2.0.2 tools_3.6.3     withr_2.4.2     rstudioapi_0.13 crayon_1.4.2    desc_1.4.0     
[11] pkgload_1.2.3   rlang_0.4.12    renv_0.13.2   

DavorJ avatar Nov 24 '21 10:11 DavorJ

Does this simpler reprex illustrate the same problem?

library(testthat)
test_that('UTF-8 encoding test', {
  expect_equal("€²", "\u20ac\u00b2")
})

(I think it should since using unicode escapes should force the string to be encoded as UTF-8)

If it doesn't, could you please show me Encoding(expected) and Encoding(utf8word)?

hadley avatar Jan 04 '22 01:01 hadley

And please try with the dev version of testthat, which has been overhauled to use brio everywhere. This has resolved a couple of other encoding related issues, so might solve yours too.

hadley avatar Jan 05 '22 04:01 hadley