ompi icon indicating copy to clipboard operation
ompi copied to clipboard

mpirun v4.1.7 prioritizing rank placement based on launch node

Open aw-lauria opened this issue 7 months ago • 4 comments

Background information

Hi all - we're seeing unexpected rank placement when not launching from the first host in our hostfile. Orte seems to prioritize the launch node when assigning ranks. For example:

mpirun --hostfile ./hosts -N 4 ./echo.sh  | grep computeA
computeA: 0
computeA: 1
computeA: 2
computeA: 3

where the hostfile looks like this:

computeB
computeA

and echo.sh is just:

#!/usr/bin/bash
echo $(hostname): $OMPI_COMM_WORLD_RANK

Basically it is giving priority rank assignment to the launch node. We would expect that computeA would be assigned ranks 4 through 7 based on the hostfile ordering.

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v4.1.4 and v4.1.7

Is this expected behavior? What is the rationale? This is something we've run into occasionally, and it can have an performance impact on certain workloads. We can work-around it of course by always launching from the first node in the hostfile. It just happens that sometimes in our testing, we occasionally launch from the wrong node.

Thanks!

aw-lauria avatar Apr 18 '25 03:04 aw-lauria