Minimize code changes when converting MPI app to native Hermes app

Open ChristopherHogan opened this issue 4 years ago • 0 comments

Current changes required to convert an MPI app to a native Hermes app

Request MPI_THREAD_MULTIPLE support. This is only required if the app isn't already doing it, and it's possible that MPI_THREAD_FUNNELED would be sufficient (this will be determined once we're ready to release version 1.0).
Initialize Hermes: `std::shared_ptrhapi::Hermes hermes = hapi::InitHermes(conf);
Acquire a special Hermes MPI_Comm and use it anywhere the app would have used MPI_COMM_WORLD. Example:

MPI_Comm *app_comm = (MPI_Comm *)hermes->GetAppCommunicator();
MPI_Bcast(..., *app_comm);

Put the application code into a if (hermes->IsApplicationCore()) {...} block.
Finalize Hermes (outside the if block of 4): hermes->Finalize()
Run mpirun with one extra process per node than you would normally run.

# Normal single node run
mpirun -n 4 a.out
# Hermes single node run
mpirun -n 5 a.out

# Normal multi-node run
mpirun -n 4 -ppn 2 -hosts node1,node2 a.out
# Hermes multi-node run
mpirun -n 6 -ppn 3 -hosts node1,node2 a.out

Steps to eliminate the code change requirements

Eliminate 1 and 2 by intercepting MPI_Init/MPI_Init_thread and requesting the appropriate thread support, then calling InitHermes and storing the result in a global (singleton).
Eliminate 3 by intercepting every MPI function that takes an MPI_Comm argument and applying the following algorithm:

if communicator == MPI_COMM_WORLD:
  replace communicator with result of hermes->GetAppCommunicator()

Eliminate 5 by intercepting MPI_Finalize.
Eliminate 4 by running the app in MPMD style, where we run an instance of a special hermes_core program on each node that looks like this:

namespace hapi = hermes::api;

int main() {
  MPI_Init_thread();
  std::shared_ptr<hapi::Hermes> hermes = hapi::InitHermes(conf);
  hermes->Finalize();
  MPI_Finalize();

 return 0;

The mpirun commands would now look like this:

# Normal single node run
mpirun -n 4 a.out
# Hermes single node run
mpirun -n 1 hermes_core : -n 4 a.out

# Normal multi-node run
mpirun -n 4 -ppn 2 -hosts node1,node2 a.out
# Hermes multi-node run
mpirun -n 2 -ppn 1 -hosts node1,node2 hermes_core : -n 4 -ppn 2 a.out

We could take this one step further and eliminate the need to intercept MPI calls that take a communicator (i.e., let the app use MPI_COMM_WORLD as normal) by running the hermes_core program as a daemon. Instead of InitHermes() it would call InitHermesDaemon() In that case the launch commands would be:

# Single node
mpirun -n 1 hermes_core &
mpirun -n 4 a.out

# Multi-node
mpirun -n 2 -ppn 1 -hosts node1,node2 hermes_core &
mpirun -n 4 -ppn 2 -hosts node1,node2 a.out

Doing this would require 2 additional changes to the library:

Hermes::Finalize() would have to call RemoteFinalize() on the app ranks to shutdown the daemon (or force the user to shut it down explicitly. Maybe a hermes up and hermes down cli program).
The app ranks would have to have a way to make sure the Hermes core is initialized before they begin executing. They would presumably loop on an IsHermesInitialized RPC.

Feb 24 '21 17:02 ChristopherHogan