bench icon indicating copy to clipboard operation
bench copied to clipboard

FR: pre-code, non-benchmarked

Open r2evans opened this issue 2 years ago • 5 comments

When benchmarking code that produces side-effect, it is common to have to undo the side-effect on every iteration. One example is benchmarking any data.table-native code, where the original frame is modified. Unfortunately, this clean-up act is also included in the benchmarking, which is often undesirable.

Is there a way to add clean-up or pre-run or some kind of code that operates on each iteration without being counted in each expression?

(I'm assuming that for many reasons, we cannot rely on expressions to be executed in order or even the same number of times, ergo the dynamic n_itr column.)

One such use-case in a recent StackOverflow question/answer: https://stackoverflow.com/a/69770643/3358272 (sufficient code from this is below in the reprex).

The intent of the answer is to compare row-sorting (large-ish data) between different methods. The fastest (at the time of the writing, and relative to that data) is data.table::setorder, but when benchmarking it, the first run is the only honest run: the second and subsequent iterations are all sorting already-sorted data, which will invariably be much faster. My mitigation was to include a copy(.) version of the expression, which involves its own overhead. I'd very much like to be able to exclude copy from the time/memory measurements.

I'm not certain the best way to structure this:

  • a single pre (or post/cleanup) expression run for all other expressions; this has the potential to add the most overhead, and is likely unnecessary for most of the expressions
  • per-expression pre (or post/cleanup) expressions

I don't know how the second would look, frankly, just starting the conversation. Or, if it can already be done, asking how :-)

reprex
library(data.table)
iris_big <- merge(x = iris, y = iris, by = NULL) 
iris_big_dt <- as.data.table(iris_big) #for data.table
iris_big_dt1 <- as.data.table(iris_big) #for data.table
iris_big_dt2 <- as.data.table(iris_big) #for data.table
bench::mark(
  "Base R"       = iris_big[base::order(iris_big$Petal.Width.y),],
  "dplyr"        = dplyr::arrange(iris_big, Petal.Width.y),
  "data.table 1" = data.table::setorder(iris_big_dt1, "Petal.Width.y"),
  "data.table 2" = data.table::setorder(copy(iris_big_dt2), "Petal.Width.y"),
  check = FALSE)

r2evans avatar Nov 01 '21 12:11 r2evans

If it only makes sense for the expressions to run once you can set iterations = 1 and it will only run them once. You will also have to set memory = FALSE though, since the memory tracking runs the code once itself. This is done outside the normal benchmarking times since tracking the memory requires overhead.

bench::press() can do preprocessing steps, though this example is somewhat awkward for it. Maybe a solution would be to do something like this?

library(data.table)

times <- bench::press(
  method = c("Base R", "dplyr", "data.table"),
  {
  iris_big <- merge(x = iris, y = iris, by = NULL) 
  iris_big_dt <- as.data.table(iris_big) #for data.table
  methods <- c(
    "Base R"       = function(x) x[base::order(x$Petal.Width.y),],
    "dplyr"        = function(x) dplyr::arrange(x, Petal.Width.y),
    "data.table" = function(x) data.table::setorder(x, "Petal.Width.y")
  )
  bench::mark(
    methods[[method]](iris_big_dt),
    memory = FALSE,
    iterations = 1,
    check = function(x, y) all.equal(as.data.frame(x), as.data.frame(y))
  )
})
#> Running with:
#>   method
#> 1 Base R
#> 2 dplyr
#> 3 data.table


times[c("method", "min")]
#> # A tibble: 3 × 2
#>   method          min
#>   <chr>      <bch:tm>
#> 1 Base R        2.2ms
#> 2 dplyr        3.02ms
#> 3 data.table 879.72µs

Created on 2021-11-01 by the reprex package (v2.0.1)

jimhester avatar Nov 01 '21 14:11 jimhester

Thanks @jimhester.

I was not suggesting setting iterations=1 as the intended behavior.

For your bench::press recommendation, is the whole expression timed each pass, or do the merge(.) and as.data.table(.) not get timed somehow? I'm thinking "no", since the results more closely resemble the "sorted" row below (where the benchmark is effectively sorting pre-sorted data). I was unable to adapt your bench::press to effect what I'm envisioning.

For instance, I can demonstrate that the pre-sorted data sorts faster than the unsorted data (hear me out):

iris_big <- merge(x = iris, y = iris, by = NULL) 
iris_big_dt <- as.data.table(iris_big) #for data.table
iris_big_dt_sorted <- setorderv(as.data.table(iris_big), "Petal.Width.y")
iris_big_dt_unsorted <- as.data.table(iris_big)
bench::mark(
  sorted   = setorder(copy(iris_big_dt_sorted),"Petal.Width.y"),
  unsorted = setorder(copy(iris_big_dt_unsorted),"Petal.Width.y"),
  min_iterations = 1000)
# # A tibble: 2 x 13
#   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result                          memory                  time             gc                  
#   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>                          <list>                  <list>           <list>              
# 1 sorted      914.8us   1.39ms      744.    1.65MB     46.6   941    59      1.26s <data.table[,10] [22,500 x 10]> <Rprofmem[,3] [13 x 3]> <bch:tm [1,000]> <tibble [1,000 x 3]>
# 2 unsorted     1.79ms   1.93ms      479.    1.84MB     33.3   935    65      1.95s <data.table[,10] [22,500 x 10]> <Rprofmem[,3] [26 x 3]> <bch:tm [1,000]> <tibble [1,000 x 3]>

My point there is not to demonstrate what we know (pre-sorted data sorts faster), it's to highlight the problem use-case that identifies a feature I think might be useful. I cannot do an honest benchmark of data.table::setorder in isolation, since the second and subsequent iterations will run around 30% faster than the first iteration (according to the sorted/unsorted benchmark). When we're comparing data.table-to-data.table, then adding copy(.) to each call levels the playing field, but it still biases the results ... and depending on the cleanup or pre-code that needs to be done, it might dwarf (in scale or noise) what we're trying to measure. When we have non-data.table models to measure, introducing copy into the data.table-models misrepresents the performance comparisons with the others.

The only way to get an honest benchmark is to call setorder on unsorted data each time, something we cannot do easily given data.table's referential semantics. Ultimately, we need some pre-conditioning don't want to include overhead of conditioning the data in the measurements.

In an attempt to better demonstrate, consider this:

bench::mark(
  pre = list(
    "data.table" = ~ X <- as.data.table(iris_big)
  ),
  "dplyr"      = dplyr::arrange(iris_big, Petal.Width.y),
  "data.table" = data.table::setorder(X, "Petal.Width.y"),
  min_iterations = 1000,
  check = FALSE)

In horrible pseudo-code, execution might look like this. (I'll order them sequentially for readability, not certain if the order is randomized in reality.)

## running the "dplyr" model
measure( methods[["dplyr"]] )
measure( methods[["dplyr"]] )
...
measure( methods[["dplyr"]] )
## running the "data.table" model
nomeasure( pre[["data.table"]] )
measure( methods[["data.table"]] )
nomeasure( pre[["data.table"]] )
measure( methods[["data.table"]] )
...
nomeasure( pre[["data.table"]] )
measure( methods[["data.table"]] )

So that for each of the iterations of the "data.table" model, it runs the pre code without metrics (or with separate metrics?). The effect is that in the output, the "data.table" measurements are purely on the call to setorder.

I apologize if I'm missing the point on bench::press or elsewhere. Thanks.

r2evans avatar Nov 01 '21 18:11 r2evans

The only thing timed is the final bench::mark() expressions.

jimhester avatar Nov 01 '21 19:11 jimhester

Thanks. It still doesn't quite fit the bill since the data is not restored.

Thanks for the discussion. I know this isn't a light-lift request.

r2evans avatar Nov 01 '21 19:11 r2evans

I have the exact same use-case. I wanted to share the code I ended up using (based on @jimhester’s suggestion above) since it might be useful for others:

methods = list(a = function () …, b = function () …, …)

times_use = bench::press(
    method = names(methods),
    replicate = seq(1L, 100L),
    {
        setup()
        m = methods[[method]]

        res = bench::mark(
            m(),
            memory = FALSE,
            iterations = 1L,
            check = function (x, y) TRUE
        )
        cleanup()
        res
    }
) |>
    group_by(method) |>
    summarize(
        time = list(median),
        min = min(median),
        median = median(median),
        `itr/sec` = mean(`itr/sec`),
        n_itr = n(),
        n_gc = 0,
        total_time = bench::as_bench_time(sum(unlist(time))),
        result = list(NULL),
        memory = list(NULL),
        gc = list(as_tibble(setNames(rep(list(rep(0L, n())), 3L), paste0('level', 0 : 2)))),
        .groups = 'drop'
    ) |>
    rename(expression = method) %>%
    `class<-`(c('bench_mark', class(.)))

This code runs a fixed number of replicates and transforms the result data to make it amenable to be plotted with ggplot2::autoplot. It actually works fairly well; I’m not getting memory info but that’s fine in my case.

klmr avatar Jan 23 '22 18:01 klmr

I looked into this a little and it seems like it would be rather challenging to add in per-expression, per-iteration setup code that won't also interfere with how things like min_time and gcinfo are computed.

However, I did come up with an alternative approach which seems sufficiently general that it could be useful for these kind of side effect related examples:

library(data.table)

iterations <- 100L
# Because `memory` checking is 1 extra iteration
copies <- iterations + 1L

iris_big <- merge(x = iris, y = iris, by = NULL) 
iris_big_dt <- as.data.table(iris_big) #for data.table
iris_big_dt1 <- as.data.table(iris_big) #for data.table
iris_big_dt2 <- as.data.table(iris_big) #for data.table

iris_big_dt1 <- lapply(seq_len(copies), function(i) copy(iris_big_dt1))
iris_big_dt2 <- lapply(seq_len(copies), function(i) copy(iris_big_dt2))

i <- 1L
j <- 1L

bench::mark(
  "Base R"       = iris_big[base::order(iris_big$Petal.Width.y),],
  "dplyr"        = dplyr::arrange(iris_big, Petal.Width.y),
  "data.table 1" = { data.table::setorder(iris_big_dt1[[i]], "Petal.Width.y"); i <<- i + 1L},
  "data.table 2" = { data.table::setorder(iris_big_dt2[[j]], "Petal.Width.y"); j <<- j + 1L},
  iterations = iterations,
  check = FALSE
)
#> # A tibble: 4 × 6
#>   expression        min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 Base R         2.94ms   3.36ms      289.    1.97MB     2.92
#> 2 dplyr          1.87ms   2.46ms      400.   11.34MB     8.17
#> 3 data.table 1   1.12ms   1.27ms      774.  375.97KB     0   
#> 4 data.table 2   1.19ms   1.38ms      717.   285.8KB     0

Since most R code doesn't modify in place, and because of where bench is in its package lifecycle, I don't see a setup or teardown argument as high priority at this time, so I'm going to close this one to allow us to keep the Issues page fairly focused.

DavisVaughan avatar May 03 '23 15:05 DavisVaughan