clustermq icon indicating copy to clipboard operation
clustermq copied to clipboard

Accidental data sending via closures

Open mschubert opened this issue 1 year ago • 1 comments

Users may inadvertently send hundreds of gigabytes of data via the network by using closures. Consider the following:

do_my_stuff = function() {
    huge_object = runif(1e9)
    my_parallel = function(i) {
    }
    Q(my_parallel, i=1:1000, n_jobs=1000)
}

This will send 7.6 Gb to 1000 workers (= 7.6 Tb total), without any of the workers requiring it.

It should not be that easy to make this mistake.

Related: https://github.com/mschubert/clustermq/issues/200, likely re-introduced with globals package changes close to https://github.com/mschubert/clustermq/commit/9f01bbcaa83832487397d8541673fcabc159e74b

mschubert avatar Sep 26 '24 17:09 mschubert

This may also happen with formulas, e.g. design(DESeqDataSet) will include the whole environment in the serialized object if design is set within a function

mschubert avatar Jan 09 '25 16:01 mschubert