guacamole
guacamole copied to clipboard
Investigate size of broadcast variable
Most job on a full genome print the following warning:
Not enough space to cache broadcast_2 in memory! (computed 488.3 MB so far)
What is the broadcast variable that is nearly 0.5 G? Most likely the loci map, but why? Possibly, it is better to just pass a task it's portion of the map rather than broadcasting since the tasks are now broadcast anyways?
What does "tasks are now broadcast anyways" mean?
On Tue, Nov 25, 2014 at 9:28 AM, Arun Ahuja [email protected] wrote:
Most job on a full genome print the following warning:
Not enough space to cache broadcast_2 in memory! (computed 488.3 MB so far)
What is the broadcast variable that is nearly 0.5 G? Most likely the loci map, buy why? Possibly, it is better to just pass a task it's portion of the map rather than broadcasting since the tasks are now broadcast anyways?
— Reply to this email directly or view it on GitHub https://github.com/hammerlab/guacamole/issues/254.
Spark recently made a change to broadcast serialized Task
s to executors, IIRC. I guess the significance of this is that it can happen much more quickly, e.g. using torrent-broadcast methods instead of the driver copying the closure to each individual executor?