tinytest
tinytest copied to clipboard
Alphabetical order in testing environment different than in regular R environment?
I'm developing a package for project-specific data processing. One step is checking whether a number of names are really distinct, or if similar names refer to the same person. For this I first generate from a database a data.table
of pairs that are similar based on string similarity, and compare this to a data.table
of pairs for that I have manually checked whether they refer to the same person. If all similar sounding names have been covered in my manually compiled list, the test passes.
I do this via a negative join with data.table:
dt_redux <- dt_pairs_from_db[!dt_manually_checked_pairs, on = .(name1, name2)]
expect_true(nrow(dt_redux)==0)
This test did pass when calling test_all
or build_install_test
, but failed in R CMD check
.
After some searching I tracked it down to the name order in dt_pairs_from_db
. Here the pairs are generated from a string similarity function, which creates two entries for each couple (name1, name2 and name2, name1). To avoid having to check each couple twice, I only cover the cases where name1 > name2
. However for one couple, "İnan Kıraç" and "Suna Kıraç", the alphabetical order differs between the normal R environment and the testing environment: In the normal R environment, expect_true("İnan Kıraç" > "Suna Kıraç")
fails, but in the testing environment (in my test_package.R
file), expect_true("İnan Kıraç" > "Suna Kıraç")
passes.
This difference in alphabetical order lead to a dt_pairs_from_db
being generated that didn't match the order of pairs to check in my dt_manually_checked_pairs
, which caused the test to fail.
I've now fixed it by just adding this particular couple in both comparisons to my dt_manually_checked_pairs
, but I'm curious what caused this; any ideas?
I think/vaguely remember that R CMD check uses the 'C' collation chart, so bytewise sorting. You could try to set the lc_collate variable to C with sys.setenv in your test file so it is always used
See also the 'details' section in ?run_test_dir