testthat icon indicating copy to clipboard operation
testthat copied to clipboard

Snapshot error on Windows (encoding problem)

Open maelle opened this issue 3 years ago • 1 comments

In this reprex package https://github.com/maelle/encoding.problem I have a test

test_that("multiplication works", {
  bla <- list(someone = "Maëlle")
  expect_snapshot_output(print(bla))
})

I created the snapshot on Ubuntu. On Windows the test fails https://github.com/maelle/encoding.problem/actions/runs/1845617963

With error

> test_check("encoding.problem")
Error in gsub("\n", paste0("\n", prefix), x, fixed = TRUE) : 
Error: Error: R CMD check found ERRORs
  input string 1 is invalid UTF-8
Calls: test_check ... vapply -> FUN -> paste0 -> indent_add -> paste0 -> gsub
Execution halted

Which comes from indent_add() and functions it calls.

maelle avatar Feb 15 '22 07:02 maelle

I have also encountered this error after the last testthat release.

At the time, at wondered if it was related to using brio for reading and writing, so assuming UTF-8. (https://github.com/r-lib/testthat/commit/6666662844274e8fa1988c8e0cfecf0b13399ee1)

There is a place in eval_with_output() where file is read assuming UTF-8 but not sure the file is UTF-8 on windows https://github.com/r-lib/testthat/blob/6666662844274e8fa1988c8e0cfecf0b13399ee1/R/capture-output.R#L44-L60

testthat:::eval_with_output(print("maëlle"))
#> $val
#> [1] "maëlle"
#> 
#> $vis
#> [1] FALSE
#> 
#> $out
#> [1] "[1] \"ma<eb>lle\""

With parent commit before brio switch

testthat::eval_with_output(print("maëlle"))
#> $val
#> [1] "maëlle"
#> 
#> $vis
#> [1] FALSE
#> 
#> $out
#> [1] "[1] \"maëlle\""

withr::with_output_sink() will let sink() write to a file. If the connection is not opened, it will probably opened a connection in native encoding and the resulting file cannot be read as UTF-8 using brio::read_lines(). This is the part that creates the "wrong" value.

I know sink() is source of issue with encoding (https://github.com/r-lib/evaluate/issues/59) and file connection and encoding are not realy easy to master. So I don't know if this is something hat can be improve for windows users with snapshot test.

I just observe this

tmp <- tempfile()
sink(tmp)
print('ë')
sink()
brio::read_lines(tmp)
#> [1] "[1] \"\xeb\""
unlink(tmp)

tmp <- tempfile()
con <- file(tmp, encoding = "UTF-8")
sink(con)
print('ë')
sink()
brio::read_lines(tmp)
#> [1] "[1] \"ë\""
unlink(tmp)

Anyway, just sharing what I had found when stubbling into this issue as @maelle shared here Windows issue with me. If possible, insuring to write as UTF-8 could help, or maybe not use brio::read_lines() in this specific place ?

Hope it helps.

cderv avatar Feb 15 '22 09:02 cderv

Is this fixed in R4.2? If so, do you need it to work in R4.1 too? (It's likely to be a couple of hours work for me, so I'd prefer to use that time on other things if your motivating issue is resolved by updating R).

hadley avatar Sep 21 '22 20:09 hadley

Indeed it now works https://github.com/maelle/encoding.problem/actions/runs/3103119734 We can definitely work without a fix :slightly_smiling_face:

maelle avatar Sep 22 '22 05:09 maelle

I fixed this anyway, thanks to @cderv's analysis and a hint from Kurt Hornik.

hadley avatar Sep 24 '22 13:09 hadley