lintr icon indicating copy to clipboard operation
lintr copied to clipboard

Check if tests are coupled

Open IndrajeetPatil opened this issue 2 years ago • 7 comments

cf. https://github.com/r-lib/lintr/issues/1937

IndrajeetPatil avatar Mar 26 '23 08:03 IndrajeetPatil

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 99.24%. Comparing base (132a6dd) to head (e6b2dce).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1938   +/-   ##
=======================================
  Coverage   99.24%   99.24%           
=======================================
  Files         129      129           
  Lines        7317     7317           
=======================================
  Hits         7262     7262           
  Misses         55       55           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Mar 26 '23 08:03 codecov-commenter

Looks like there is some order dependence between our tests.

IndrajeetPatil avatar Mar 26 '23 08:03 IndrajeetPatil

Could you describe how this works? I'm not reproducing. To check, I tried running our test suite with each file in an individual subprocess:

tests = list.files('tests/testthat', pattern = '^test', full.names = TRUE)
for (test in tests) {
  system2("Rscript", c("-e", shQuote(paste(
    c(
      'attach(asNamespace("lintr"), warn.conflicts=FALSE)',
      "library(testthat)",
      sprintf('test_file("%s")', test)
    ),
    collapse = ";"
  ))))
}

I see the errors in test-knitr_formats.R and test-object_name_linter.R (trivial fix incoming), but not any others.

MichaelChirico avatar Apr 08 '23 19:04 MichaelChirico

I guess this test also expects test_that() units within each file to run independently. Testing:

library(withr)
library(xml2)
library(xmlparsedata)

tests = list.files('tests/testthat', pattern = '^test', full.names = TRUE)
non_tests = setdiff(list.files('tests/testthat', full.names = TRUE), tests)
# required for assumed directory structure in tests to work in tempdir()
file.copy(non_tests, tempdir(), recursive = TRUE)

unit_xp = "expr/SYMBOL_FUNCTION_CALL[text() = 'test_that' or text() = 'with_parameters_test_that']"
lines_by_xpath <- function(xml, xpath) {
  matches <- xml_find_all(xml, xpath)
  line1 <- as.integer(xml_attr(matches, "line1"))
  line2 <- as.integer(xml_attr(matches, "line2"))
  mapply(`:`, line1, line2, SIMPLIFY = FALSE)
}
# NB: "testthat::CompactProgressReporter" is more informative
run_units_individually <- function(test_file, reporter = "testthat::FailReporter") {
  message(basename(test_file))
  test_xml <- read_xml(xml_parse_data(parse(test_file)))
  test_lines <- readLines(test_file)

  boilerplate_lines <- lines_by_xpath(test_xml, sprintf("expr[not(%s)]", unit_xp))
  test_boilerplate <- test_lines[unlist(boilerplate_lines)]

  test_that_units <- lines_by_xpath(test_xml, sprintf("expr[%s]", unit_xp))
  for (unit in test_that_units) {
    if (reporter == "testthat::FailReporter") message(".", appendLF = FALSE)
    unit_name <- gsub(" ", "_", gsub('^[^"]*"|"[^"]*$', '', test_lines[unit[1L]]))
    with_tempfile("tmp", pattern = paste0(basename(test_file), "_", unit_name), {
      cat(test_boilerplate, file = tmp, sep = "\n")
      cat(test_lines[unit], file = tmp, sep = "\n", append = TRUE)
      system2(
        "Rscript",
        env = "TESTTHAT_EDITION=3",
        c("-e", shQuote(sprintf(
          'testthat::test_file("%s", package = "lintr", reporter = %s)',
          tmp, reporter
        )))
      )
    })
  }
  if (reporter == "testthat::FailReporter")   cat("\n")
}
for (test in tests) run_units_individually(test)

I see some failures, but not the same as this CI test is giving (omitting passing files):

test-get_source_expressions.R
.............Error: Failures detected.
Execution halted
..
test-linter_tags.R
.........Error: Failures detected.
Execution halted
test-parse_exclusions.R
....Error: Failures detected.
Execution halted
.Error: Failures detected.
Execution halted
...

Investigating more.

MichaelChirico avatar Apr 09 '23 23:04 MichaelChirico

The ones in test-get_source_expressions.R and test-linter_tags.R look spurious to me. Here's the failure output for test-parse_exclusions.R:

══ Testing test-parse_exclusions.R454a948b83fca ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ] Done!

══ Testing test-parse_exclusions.R454a974dd3abe ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 2 ] Done!

══ Testing test-parse_exclusions.R454a930e47343 ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 3 ] Done!

══ Testing test-parse_exclusions.R454a973607214 ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]

── Failure (test-parse_exclusions.R454a973607214:21:3): it supports multiple linter exclusions ──
lintr:::parse_exclusions(t1) (`actual`) not identical to list(...) (`expected`).

`actual` is length 0
`expected` is length 4

`names(actual)` is absent
`names(expected)` is a character vector ('a', 'b', 'c', '')

`actual[[1]]` is absent
`expected[[1]]` is an integer vector (2, 4, 5, 6)

`actual[[2]]` is absent
`expected[[2]]` is an integer vector (2, 4, 5, 6)

`actual[[3]]` is absent
`expected[[3]]` is an integer vector (4, 5, 6)

`actual[[4]]` is absent
`expected[[4]]` is an integer vector (1, 5, 8, 9, 10, ...)

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]

══ Testing test-parse_exclusions.R454a91f8e2630 ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]

── Failure (test-parse_exclusions.R454a91f8e2630:16:3): it supports overlapping exclusion ranges ──
lintr:::parse_exclusions(t1) (`actual`) not identical to list(a = 1L:5L, b = 3L:6L) (`expected`).

`actual` is length 0
`expected` is length 2

`names(actual)` is absent
`names(expected)` is a character vector ('a', 'b')

`actual$a` is absent
`expected$a` is an integer vector (1, 2, 3, 4, 5)

`actual$b` is absent
`expected$b` is an integer vector (3, 4, 5, 6)

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]

══ Testing test-parse_exclusions.R454a910304a6e ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 2 ] Done!

══ Testing test-parse_exclusions.R454a940e45510 ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ] Done!

══ Testing test-parse_exclusions.R454a92dbfdaf1 ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 2 ] Done!

MichaelChirico avatar Apr 09 '23 23:04 MichaelChirico

This is the exact script I am using in the GHA workflow:

options(crayon.enabled = TRUE)
withr::local_envvar(TESTTHAT_PARALLEL = "FALSE")
library(cli)
library(glue)
pkgload::load_all(".")

test_script_paths <- testthat::find_test_scripts("tests/testthat")

seed <- sample.int(1e6, 1L)
cli_inform("Chosen seed for the current test run: {seed}")
set.seed(seed)

randomized_test_script_paths <- sample(test_script_paths)
any_test_failures <- FALSE
any_test_errors <- FALSE

test_path <- function(path) {
  report <- as.data.frame(testthat::test_file(path, reporter = "silent"))
  has_test_failures <- any(report$failed == 1L)
  has_test_errors <- any(report$error == 1L)

  if (has_test_failures) {
    cli_alert_danger(glue("Tests in `{path}` are failing."))
    any_test_failures <<- TRUE
    failed_tests <- tibble::as_tibble(subset(report, failed == 1L)[, c("test", "result")])
    print(glue::glue_data(failed_tests, "Test `{test}` is failing:\n{purrr::pluck(result, 1L, 1L)}"))
  }

  if (has_test_errors) {
    cli_alert_danger(glue("There was error while running tests in `{path}`."))
    any_test_errors <<- TRUE
    errored_tests <- tibble::as_tibble(subset(report, error == 1L)[, c("test", "result")])
    print(glue::glue_data(errored_tests, "Test `{test}` has error:\n{purrr::pluck(result, 1L, 1L)}"))
  }

  if (!has_test_failures && !has_test_errors) {
    cli_alert_success(glue("All tests passing in `{path}`."))
  }
}

cli_rule()
cli_inform("Running tests in random order:")
cli_rule()

purrr::walk(randomized_test_script_paths, test_path)

cli_rule()

if (any_test_failures) {
  cli_abort("Tests in some files are failing.")
}

if (any_test_errors) {
  cli_abort("There was error while running tests in some files.")
}

if (!any_test_failures && !any_test_errors) {
  cli_alert_success("Tests from all files are passing!")
}

cli_rule()

The approach is very simple: just randomize the order in which each test file is run. Tests within each file are always run in the same order, though. Maybe that's also something that can be included as an additional check.

IndrajeetPatil avatar Apr 15 '23 08:04 IndrajeetPatil

I'd suggest putting the script into .dev instead of inline. This would also allow adding convenience features such as a command line argument for a static seed when trying to debug a failure.

AshesITR avatar Apr 19 '23 06:04 AshesITR

it looks like shuffle=TRUE is designed to make this trivial to test:

https://tidyverse.org/blog/2025/11/testthat-3-3-0/

MichaelChirico avatar Nov 15 '25 15:11 MichaelChirico

Yes :)

https://github.com/r-lib/testthat/issues/1942

IndrajeetPatil avatar Nov 15 '25 15:11 IndrajeetPatil

The failing tests with testthat 3.3.0 are expected to fail (cf. https://github.com/r-lib/lintr/pull/2937). The incompatibility needs to be resolved in a separate PR.

IndrajeetPatil avatar Nov 15 '25 18:11 IndrajeetPatil

@MichaelChirico Took only 3 years but the builds are now passing 😆

IndrajeetPatil avatar Nov 16 '25 06:11 IndrajeetPatil

could you run this locally, say, 10 times to guard against flakiness before we do the initial merge?

I'm also wondering if this should be run on every commit or just during release wdyt?

MichaelChirico avatar Nov 16 '25 15:11 MichaelChirico

I'm also wondering if this should be run on every commit or just during release wdyt?

SGTM. Do we have a standardized release process? E.g. a branch name that always starts with rc-*?

IndrajeetPatil avatar Nov 16 '25 16:11 IndrajeetPatil

As of now the "standardization" only goes so far as this checklist:

https://github.com/r-lib/lintr/issues/2903

So I'd add another bullet there.

MichaelChirico avatar Nov 18 '25 19:11 MichaelChirico

OK @IndrajeetPatil, I added the bullet point. I think that means we can close here, IIUC, is that right?

MichaelChirico avatar Nov 20 '25 21:11 MichaelChirico

@MichaelChirico Sorry, I still haven't been able to run the test suite 10 times in a random order. I will try to do so today.

IndrajeetPatil avatar Nov 21 '25 03:11 IndrajeetPatil