clustermq
clustermq copied to clipboard
Accidental data sending via closures
Users may inadvertently send hundreds of gigabytes of data via the network by using closures. Consider the following:
do_my_stuff = function() {
huge_object = runif(1e9)
my_parallel = function(i) {
}
Q(my_parallel, i=1:1000, n_jobs=1000)
}
This will send 7.6 Gb to 1000 workers (= 7.6 Tb total), without any of the workers requiring it.
It should not be that easy to make this mistake.
Related: https://github.com/mschubert/clustermq/issues/200, likely re-introduced with globals package changes close to https://github.com/mschubert/clustermq/commit/9f01bbcaa83832487397d8541673fcabc159e74b
This may also happen with formulas, e.g. design(DESeqDataSet) will include the whole environment in the serialized object if design is set within a function