bench
bench copied to clipboard
FR: pre-code, non-benchmarked
When benchmarking code that produces side-effect, it is common to have to undo the side-effect on every iteration. One example is benchmarking any data.table
-native code, where the original frame is modified. Unfortunately, this clean-up act is also included in the benchmarking, which is often undesirable.
Is there a way to add clean-up or pre-run or some kind of code that operates on each iteration without being counted in each expression?
(I'm assuming that for many reasons, we cannot rely on expressions to be executed in order or even the same number of times, ergo the dynamic n_itr
column.)
One such use-case in a recent StackOverflow question/answer: https://stackoverflow.com/a/69770643/3358272 (sufficient code from this is below in the reprex
).
The intent of the answer is to compare row-sorting (large-ish data) between different methods. The fastest (at the time of the writing, and relative to that data) is data.table::setorder
, but when benchmarking it, the first run is the only honest run: the second and subsequent iterations are all sorting already-sorted data, which will invariably be much faster. My mitigation was to include a copy(.)
version of the expression, which involves its own overhead. I'd very much like to be able to exclude copy
from the time/memory measurements.
I'm not certain the best way to structure this:
- a single pre (or post/cleanup) expression run for all other expressions; this has the potential to add the most overhead, and is likely unnecessary for most of the expressions
- per-expression pre (or post/cleanup) expressions
I don't know how the second would look, frankly, just starting the conversation. Or, if it can already be done, asking how :-)
reprex
library(data.table)
iris_big <- merge(x = iris, y = iris, by = NULL)
iris_big_dt <- as.data.table(iris_big) #for data.table
iris_big_dt1 <- as.data.table(iris_big) #for data.table
iris_big_dt2 <- as.data.table(iris_big) #for data.table
bench::mark(
"Base R" = iris_big[base::order(iris_big$Petal.Width.y),],
"dplyr" = dplyr::arrange(iris_big, Petal.Width.y),
"data.table 1" = data.table::setorder(iris_big_dt1, "Petal.Width.y"),
"data.table 2" = data.table::setorder(copy(iris_big_dt2), "Petal.Width.y"),
check = FALSE)
If it only makes sense for the expressions to run once you can set iterations = 1
and it will only run them once. You will also have to set memory = FALSE
though, since the memory tracking runs the code once itself. This is done outside the normal benchmarking times since tracking the memory requires overhead.
bench::press()
can do preprocessing steps, though this example is somewhat awkward for it. Maybe a solution would be to do something like this?
library(data.table)
times <- bench::press(
method = c("Base R", "dplyr", "data.table"),
{
iris_big <- merge(x = iris, y = iris, by = NULL)
iris_big_dt <- as.data.table(iris_big) #for data.table
methods <- c(
"Base R" = function(x) x[base::order(x$Petal.Width.y),],
"dplyr" = function(x) dplyr::arrange(x, Petal.Width.y),
"data.table" = function(x) data.table::setorder(x, "Petal.Width.y")
)
bench::mark(
methods[[method]](iris_big_dt),
memory = FALSE,
iterations = 1,
check = function(x, y) all.equal(as.data.frame(x), as.data.frame(y))
)
})
#> Running with:
#> method
#> 1 Base R
#> 2 dplyr
#> 3 data.table
times[c("method", "min")]
#> # A tibble: 3 × 2
#> method min
#> <chr> <bch:tm>
#> 1 Base R 2.2ms
#> 2 dplyr 3.02ms
#> 3 data.table 879.72µs
Created on 2021-11-01 by the reprex package (v2.0.1)
Thanks @jimhester.
I was not suggesting setting iterations=1
as the intended behavior.
For your bench::press
recommendation, is the whole expression timed each pass, or do the merge(.)
and as.data.table(.)
not get timed somehow? I'm thinking "no", since the results more closely resemble the "sorted"
row below (where the benchmark is effectively sorting pre-sorted data). I was unable to adapt your bench::press
to effect what I'm envisioning.
For instance, I can demonstrate that the pre-sorted data sorts faster than the unsorted data (hear me out):
iris_big <- merge(x = iris, y = iris, by = NULL)
iris_big_dt <- as.data.table(iris_big) #for data.table
iris_big_dt_sorted <- setorderv(as.data.table(iris_big), "Petal.Width.y")
iris_big_dt_unsorted <- as.data.table(iris_big)
bench::mark(
sorted = setorder(copy(iris_big_dt_sorted),"Petal.Width.y"),
unsorted = setorder(copy(iris_big_dt_unsorted),"Petal.Width.y"),
min_iterations = 1000)
# # A tibble: 2 x 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
# 1 sorted 914.8us 1.39ms 744. 1.65MB 46.6 941 59 1.26s <data.table[,10] [22,500 x 10]> <Rprofmem[,3] [13 x 3]> <bch:tm [1,000]> <tibble [1,000 x 3]>
# 2 unsorted 1.79ms 1.93ms 479. 1.84MB 33.3 935 65 1.95s <data.table[,10] [22,500 x 10]> <Rprofmem[,3] [26 x 3]> <bch:tm [1,000]> <tibble [1,000 x 3]>
My point there is not to demonstrate what we know (pre-sorted data sorts faster), it's to highlight the problem use-case that identifies a feature I think might be useful. I cannot do an honest benchmark of data.table::setorder
in isolation, since the second and subsequent iterations will run around 30% faster than the first iteration (according to the sorted/unsorted benchmark). When we're comparing data.table
-to-data.table
, then adding copy(.)
to each call levels the playing field, but it still biases the results ... and depending on the cleanup or pre-code that needs to be done, it might dwarf (in scale or noise) what we're trying to measure. When we have non-data.table
models to measure, introducing copy
into the data.table
-models misrepresents the performance comparisons with the others.
The only way to get an honest benchmark is to call setorder
on unsorted data each time, something we cannot do easily given data.table
's referential semantics. Ultimately, we need some pre-conditioning don't want to include overhead of conditioning the data in the measurements.
In an attempt to better demonstrate, consider this:
bench::mark(
pre = list(
"data.table" = ~ X <- as.data.table(iris_big)
),
"dplyr" = dplyr::arrange(iris_big, Petal.Width.y),
"data.table" = data.table::setorder(X, "Petal.Width.y"),
min_iterations = 1000,
check = FALSE)
In horrible pseudo-code, execution might look like this. (I'll order them sequentially for readability, not certain if the order is randomized in reality.)
## running the "dplyr" model
measure( methods[["dplyr"]] )
measure( methods[["dplyr"]] )
...
measure( methods[["dplyr"]] )
## running the "data.table" model
nomeasure( pre[["data.table"]] )
measure( methods[["data.table"]] )
nomeasure( pre[["data.table"]] )
measure( methods[["data.table"]] )
...
nomeasure( pre[["data.table"]] )
measure( methods[["data.table"]] )
So that for each of the iterations of the "data.table"
model, it runs the pre
code without metrics (or with separate metrics?). The effect is that in the output, the "data.table"
measurements are purely on the call to setorder
.
I apologize if I'm missing the point on bench::press
or elsewhere. Thanks.
The only thing timed is the final bench::mark()
expressions.
Thanks. It still doesn't quite fit the bill since the data is not restored.
Thanks for the discussion. I know this isn't a light-lift request.
I have the exact same use-case. I wanted to share the code I ended up using (based on @jimhester’s suggestion above) since it might be useful for others:
methods = list(a = function () …, b = function () …, …)
times_use = bench::press(
method = names(methods),
replicate = seq(1L, 100L),
{
setup()
m = methods[[method]]
res = bench::mark(
m(),
memory = FALSE,
iterations = 1L,
check = function (x, y) TRUE
)
cleanup()
res
}
) |>
group_by(method) |>
summarize(
time = list(median),
min = min(median),
median = median(median),
`itr/sec` = mean(`itr/sec`),
n_itr = n(),
n_gc = 0,
total_time = bench::as_bench_time(sum(unlist(time))),
result = list(NULL),
memory = list(NULL),
gc = list(as_tibble(setNames(rep(list(rep(0L, n())), 3L), paste0('level', 0 : 2)))),
.groups = 'drop'
) |>
rename(expression = method) %>%
`class<-`(c('bench_mark', class(.)))
This code runs a fixed number of replicates and transforms the result data to make it amenable to be plotted with ggplot2::autoplot
. It actually works fairly well; I’m not getting memory info but that’s fine in my case.
I looked into this a little and it seems like it would be rather challenging to add in per-expression, per-iteration setup code that won't also interfere with how things like min_time
and gcinfo are computed.
However, I did come up with an alternative approach which seems sufficiently general that it could be useful for these kind of side effect related examples:
library(data.table)
iterations <- 100L
# Because `memory` checking is 1 extra iteration
copies <- iterations + 1L
iris_big <- merge(x = iris, y = iris, by = NULL)
iris_big_dt <- as.data.table(iris_big) #for data.table
iris_big_dt1 <- as.data.table(iris_big) #for data.table
iris_big_dt2 <- as.data.table(iris_big) #for data.table
iris_big_dt1 <- lapply(seq_len(copies), function(i) copy(iris_big_dt1))
iris_big_dt2 <- lapply(seq_len(copies), function(i) copy(iris_big_dt2))
i <- 1L
j <- 1L
bench::mark(
"Base R" = iris_big[base::order(iris_big$Petal.Width.y),],
"dplyr" = dplyr::arrange(iris_big, Petal.Width.y),
"data.table 1" = { data.table::setorder(iris_big_dt1[[i]], "Petal.Width.y"); i <<- i + 1L},
"data.table 2" = { data.table::setorder(iris_big_dt2[[j]], "Petal.Width.y"); j <<- j + 1L},
iterations = iterations,
check = FALSE
)
#> # A tibble: 4 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 Base R 2.94ms 3.36ms 289. 1.97MB 2.92
#> 2 dplyr 1.87ms 2.46ms 400. 11.34MB 8.17
#> 3 data.table 1 1.12ms 1.27ms 774. 375.97KB 0
#> 4 data.table 2 1.19ms 1.38ms 717. 285.8KB 0
Since most R code doesn't modify in place, and because of where bench is in its package lifecycle, I don't see a setup
or teardown
argument as high priority at this time, so I'm going to close this one to allow us to keep the Issues page fairly focused.