celeritas icon indicating copy to clipboard operation
celeritas copied to clipboard

Add MPI/ensemble support to distribute events across GPUs and compute nodes

Open sethrj opened this issue 4 years ago • 1 comments

To run large jobs on leadership-class machines, we'll need to parallelize across nodes by distributing events using MPI. A first attempt could just equally partition among tasks, but we should also consider investigating whether any MPI framework has a queue-like model that can dispatch events to waiting nodes (or "rebalance" events in case some processors get unlucky).

  • [ ] Split events among Runner to avoid replicating input
  • [ ] Write JSON output to one file per process, rather than stdout

sethrj avatar Jul 19 '21 23:07 sethrj

We're not going to have this in time to do any big runs on Summit for the SciDAC proposal, so let's defer to Q3.

sethrj avatar Jan 04 '22 21:01 sethrj