future.apply
future.apply copied to clipboard
Calling `.traceback()` within `future_Map` function call can slow things down terribly depending on data size
I recently implemented a new feature to my logger that requires the use of .traceback
to fetch the function name and line number that generated that logger function call. A while later, I realised this was slowing things down drastically when run sequentially or in parallel with future_Map
function. I am not sure if this has the same effect on other future_*
functions (e.g., lapply
).
Took me a while to figure out and come up with a small example, but here it is:
require(future)
require(future.apply)
myDF <- function(...) {
t <- system.time(x <- .traceback(x=1))[["elapsed"]]
cat("time: ", t, ", length: ", length(x), ", size_mb: ", object.size(x)/1024/1024, "\n", sep="")
# print(x) # prints the entire call stack where the data is completely populated when called from future_Map
base::data.frame(...)
}
ll <- replicate(3, sample(10, 1e6, TRUE), simplify=FALSE)
system.time(Map(function(x, y) myDF(x, y), ll, ll))
# time: 0, length: 6, size_mb: 0.002937317
# time: 0, length: 6, size_mb: 0.002937317
# time: 0, length: 6, size_mb: 0.002937317
# user system elapsed
# 0.15 0.00 0.16
system.time(future_Map(function(x, y) myDF(x, y), ll, ll))
# time: 4.24, length: 27, size_mb: 53.27061
# time: 4.15, length: 27, size_mb: 53.27061
# time: 4.18, length: 27, size_mb: 53.27061
# user system elapsed
# 13.01 0.03 13.05
If you uncomment the print(x)
statement and run them, you'll see the difference in the way the call is populated. This is probably because of the way you use do.call()
in generating the mapply
function call: do.call(mapply, args=args)
. Here args
, every element of args
gets eval
uated and are therefore not just calls/expressions, but materialised objects, I suspect.