BabelStream icon indicating copy to clipboard operation
BabelStream copied to clipboard

MPI benchmark driver

Open thomasgibson opened this issue 3 years ago • 1 comments

This PR modifies the current driver main.cpp and adds MPI support for launching the benchmark across multiple devices. The main takeaways here:

  • Each MPI rank is assigned a specific GPU and launches the benchmark
  • There is no direct GPU-to-GPU communication happening
    • For the dot-kernel, the resulting sums are reduced across all MPI ranks (on the host) and broadcasted to each rank (via MPI_Allreduce).
    • Benchmark error checking is performed on all ranks.
  • Measured bandwidths are aggregated across all ranks

The only major question I have is how MPI should be treated by CMake. I am open to suggestions and happy to comply with whatever you all prefer.

thomasgibson avatar Aug 15 '22 13:08 thomasgibson

We've got a large general refactor coming for the main driver coming in #186. We should also think some more about what bandwidth we expect an MPI+X version should be measuring given there is no communication apart from the dot product. I think we discussed it, but it would be good to document the reasons for wanting MPI+X versions of BabelStream vs running this benchmark on multiple nodes concurrently with pdsh, srun, etc and post-processing.

tomdeakin avatar May 13 '24 17:05 tomdeakin