Snapshot error on Windows (encoding problem)
In this reprex package https://github.com/maelle/encoding.problem I have a test
test_that("multiplication works", {
bla <- list(someone = "Maëlle")
expect_snapshot_output(print(bla))
})
I created the snapshot on Ubuntu. On Windows the test fails https://github.com/maelle/encoding.problem/actions/runs/1845617963
With error
> test_check("encoding.problem")
Error in gsub("\n", paste0("\n", prefix), x, fixed = TRUE) :
Error: Error: R CMD check found ERRORs
input string 1 is invalid UTF-8
Calls: test_check ... vapply -> FUN -> paste0 -> indent_add -> paste0 -> gsub
Execution halted
Which comes from indent_add() and functions it calls.
I have also encountered this error after the last testthat release.
At the time, at wondered if it was related to using brio for reading and writing, so assuming UTF-8. (https://github.com/r-lib/testthat/commit/6666662844274e8fa1988c8e0cfecf0b13399ee1)
There is a place in eval_with_output() where file is read assuming UTF-8 but not sure the file is UTF-8 on windows
https://github.com/r-lib/testthat/blob/6666662844274e8fa1988c8e0cfecf0b13399ee1/R/capture-output.R#L44-L60
testthat:::eval_with_output(print("maëlle"))
#> $val
#> [1] "maëlle"
#>
#> $vis
#> [1] FALSE
#>
#> $out
#> [1] "[1] \"ma<eb>lle\""
With parent commit before brio switch
testthat::eval_with_output(print("maëlle"))
#> $val
#> [1] "maëlle"
#>
#> $vis
#> [1] FALSE
#>
#> $out
#> [1] "[1] \"maëlle\""
withr::with_output_sink() will let sink() write to a file. If the connection is not opened, it will probably opened a connection in native encoding and the resulting file cannot be read as UTF-8 using brio::read_lines(). This is the part that creates the "wrong" value.
I know sink() is source of issue with encoding (https://github.com/r-lib/evaluate/issues/59) and file connection and encoding are not realy easy to master. So I don't know if this is something hat can be improve for windows users with snapshot test.
I just observe this
tmp <- tempfile()
sink(tmp)
print('ë')
sink()
brio::read_lines(tmp)
#> [1] "[1] \"\xeb\""
unlink(tmp)
tmp <- tempfile()
con <- file(tmp, encoding = "UTF-8")
sink(con)
print('ë')
sink()
brio::read_lines(tmp)
#> [1] "[1] \"ë\""
unlink(tmp)
Anyway, just sharing what I had found when stubbling into this issue as @maelle shared here Windows issue with me.
If possible, insuring to write as UTF-8 could help, or maybe not use brio::read_lines() in this specific place ?
Hope it helps.
Is this fixed in R4.2? If so, do you need it to work in R4.1 too? (It's likely to be a couple of hours work for me, so I'd prefer to use that time on other things if your motivating issue is resolved by updating R).
Indeed it now works https://github.com/maelle/encoding.problem/actions/runs/3103119734 We can definitely work without a fix :slightly_smiling_face:
I fixed this anyway, thanks to @cderv's analysis and a hint from Kurt Hornik.