future.apply Calling `.traceback()` within `future_Map` function call can slow things down terribly depending on data size

Calling `.traceback()` within `future_Map` function call can slow things down terribly depending on data size

Open arunsrinivasan opened this issue 3 years ago • 0 comments

I recently implemented a new feature to my logger that requires the use of .traceback to fetch the function name and line number that generated that logger function call. A while later, I realised this was slowing things down drastically when run sequentially or in parallel with future_Map function. I am not sure if this has the same effect on other future_* functions (e.g., lapply).

Took me a while to figure out and come up with a small example, but here it is:

require(future)
require(future.apply)

myDF <- function(...) {
    t <- system.time(x <- .traceback(x=1))[["elapsed"]]
    cat("time: ", t, ", length: ", length(x), ", size_mb: ", object.size(x)/1024/1024, "\n", sep="")
    # print(x) # prints the entire call stack where the data is completely populated when called from future_Map
    base::data.frame(...)
}
 
ll <- replicate(3, sample(10, 1e6, TRUE), simplify=FALSE)
system.time(Map(function(x, y) myDF(x, y), ll, ll))
# time: 0, length: 6, size_mb: 0.002937317
# time: 0, length: 6, size_mb: 0.002937317
# time: 0, length: 6, size_mb: 0.002937317
#    user  system elapsed
#    0.15    0.00    0.16

system.time(future_Map(function(x, y) myDF(x, y), ll, ll))
# time: 4.24, length: 27, size_mb: 53.27061
# time: 4.15, length: 27, size_mb: 53.27061
# time: 4.18, length: 27, size_mb: 53.27061
#    user  system elapsed
#   13.01    0.03   13.05

If you uncomment the print(x) statement and run them, you'll see the difference in the way the call is populated. This is probably because of the way you use do.call() in generating the mapply function call: do.call(mapply, args=args). Here args, every element of args gets evaluated and are therefore not just calls/expressions, but materialised objects, I suspect.

Aug 27 '21 13:08 arunsrinivasan

future.apply future.apply copied to clipboard

Calling `.traceback()` within `future_Map` function call can slow things down terribly depending on data size

future.apply
future.apply copied to clipboard