guacamole icon indicating copy to clipboard operation
guacamole copied to clipboard

Investigate size of broadcast variable

Open arahuja opened this issue 10 years ago • 2 comments

Most job on a full genome print the following warning:

Not enough space to cache broadcast_2 in memory! (computed 488.3 MB so far)

What is the broadcast variable that is nearly 0.5 G? Most likely the loci map, but why? Possibly, it is better to just pass a task it's portion of the map rather than broadcasting since the tasks are now broadcast anyways?

arahuja avatar Nov 25 '14 14:11 arahuja

What does "tasks are now broadcast anyways" mean?

On Tue, Nov 25, 2014 at 9:28 AM, Arun Ahuja [email protected] wrote:

Most job on a full genome print the following warning:

Not enough space to cache broadcast_2 in memory! (computed 488.3 MB so far)

What is the broadcast variable that is nearly 0.5 G? Most likely the loci map, buy why? Possibly, it is better to just pass a task it's portion of the map rather than broadcasting since the tasks are now broadcast anyways?

— Reply to this email directly or view it on GitHub https://github.com/hammerlab/guacamole/issues/254.

timodonnell avatar Nov 25 '14 14:11 timodonnell

Spark recently made a change to broadcast serialized Tasks to executors, IIRC. I guess the significance of this is that it can happen much more quickly, e.g. using torrent-broadcast methods instead of the driver copying the closure to each individual executor?

ryan-williams avatar Nov 28 '14 14:11 ryan-williams