Add MPI/ensemble support to distribute events across GPUs and compute nodes

Open sethrj opened this issue 4 years ago • 1 comments

To run large jobs on leadership-class machines, we'll need to parallelize across nodes by distributing events using MPI. A first attempt could just equally partition among tasks, but we should also consider investigating whether any MPI framework has a queue-like model that can dispatch events to waiting nodes (or "rebalance" events in case some processors get unlucky).

[ ] Split events among Runner to avoid replicating input
[ ] Write JSON output to one file per process, rather than stdout

Jul 19 '21 23:07 sethrj

We're not going to have this in time to do any big runs on Summit for the SciDAC proposal, so let's defer to Q3.

Jan 04 '22 21:01 sethrj