rsample icon indicating copy to clipboard operation
rsample copied to clipboard

Deal with snapshot differences between R CMD check "regular" and "hard" more elegantly

Open hfrick opened this issue 1 year ago • 4 comments

In #538 a snapshot changed between R CMD check "regular" and "hard" without the suggested packages.

https://github.com/tidymodels/rsample/actions/runs/10927479579/job/30333727728

══ Failed tests ════════════════════════════════════════════════════════════════
── Failure ('test-bootci.R:169:5'): Sufficient replications needed to sufficiently reduce Monte Carlo sampling Error for BCa method ──
Snapshot of code has changed:
old[3:8] vs new[3:16]
    int_bca(bt_small, stats, .fn = get_stats)
  Condition
    Warning:
    Recommend at least 1000 non-missing bootstrap resamples for term `mean`.
+ Message
+   
+   Attaching package: 'purrr'
+   
+   The following object is masked from 'package:testthat':
+   
+       is_null
+   
  Output
    # A tibble: 1 x 6
      term  .lower .estimate .upper .alpha .method

* Run `testthat::snapshot_accept('bootci')` to accept the change.
* Run `testthat::snapshot_review('bootci')` to interactively review the change.

hfrick avatar Sep 18 '24 18:09 hfrick

🤨 This is a weird one. Nothing artful from me, though I think your thought to migrate setup code inside of test_that() blocks is a good starting point.

simonpcouch avatar Sep 18 '24 19:09 simonpcouch

Attempt 1 failed: #540 - but we have cleaner tests nonetheless 🤷‍♀️

hfrick avatar Sep 19 '24 10:09 hfrick

happening in furrr::future_map() in bca_calc(). Reprex below

library(rsample)
library(magrittr)

set.seed(888)
rand_nums <- rnorm(1000, 10, 1)
dat <- data.frame(x = rand_nums)

get_stats <- function(split, ...) {
  dat <- analysis(split)
  x <- dat[[1]]
  tibble::tibble(
    term = "mean",
    estimate = mean(x, na.rm = TRUE),
    std.error = sqrt(var(x, na.rm = TRUE) / sum(!is.na(x)))
  )
}
  
set.seed(456765)
bt_small <- bootstraps(dat, times = 10, apparent = TRUE) %>%
  dplyr::mutate(
    stats = purrr::map(splits, ~ get_stats(.x)),
    junk = 1:11
  )

tmp <- int_bca(bt_small, stats, .fn = get_stats)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for term
#> `mean`.
#> 
#> Attaching package: 'purrr'
#> The following object is masked from 'package:magrittr':
#> 
#>     set_names

EmilHvitfeldt avatar Sep 19 '24 15:09 EmilHvitfeldt

this is kinda weird, because it definitely depends on the loaded packages.

You no longer get the message if you remove library(magrittr) and switch to base pipe 😕

library(rsample)

set.seed(888)
rand_nums <- rnorm(1000, 10, 1)
dat <- data.frame(x = rand_nums)

get_stats <- function(split, ...) {
  dat <- analysis(split)
  x <- dat[[1]]
  tibble::tibble(
    term = "mean",
    estimate = mean(x, na.rm = TRUE),
    std.error = sqrt(var(x, na.rm = TRUE) / sum(!is.na(x)))
  )
}
  
set.seed(456765)
bt_small <- bootstraps(dat, times = 10, apparent = TRUE) |>
  dplyr::mutate(
    stats = purrr::map(splits, ~ get_stats(.x)),
    junk = 1:11
  )

tmp <- int_bca(bt_small, stats, .fn = get_stats)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for term
#> `mean`.

EmilHvitfeldt avatar Sep 19 '24 15:09 EmilHvitfeldt