MPI benchmark driver

Open thomasgibson opened this issue 3 years ago • 1 comments

This PR modifies the current driver main.cpp and adds MPI support for launching the benchmark across multiple devices. The main takeaways here:

Each MPI rank is assigned a specific GPU and launches the benchmark
There is no direct GPU-to-GPU communication happening
- For the dot-kernel, the resulting sums are reduced across all MPI ranks (on the host) and broadcasted to each rank (via MPI_Allreduce).
- Benchmark error checking is performed on all ranks.
Measured bandwidths are aggregated across all ranks

The only major question I have is how MPI should be treated by CMake. I am open to suggestions and happy to comply with whatever you all prefer.

Aug 15 '22 13:08 thomasgibson

We've got a large general refactor coming for the main driver coming in #186. We should also think some more about what bandwidth we expect an MPI+X version should be measuring given there is no communication apart from the dot product. I think we discussed it, but it would be good to document the reasons for wanting MPI+X versions of BabelStream vs running this benchmark on multiple nodes concurrently with pdsh, srun, etc and post-processing.

May 13 '24 17:05 tomdeakin