Microsoft-MPI icon indicating copy to clipboard operation
Microsoft-MPI copied to clipboard

Simple program fails with mixed AMD/Intel machines

Open ghost opened this issue 11 months ago • 0 comments

Works with only Intel or only AMD nodes. With a mix, it always fails in the same way. It completes loop 3 and gets stuck in the MPI_Barrier for loop 4.

#include #include <mpi.h>

int main()
{
    int argc = 0;
    MPI_Init(&argc, nullptr);

    const int count = 100;
    for (int i = 0; i < count; ++i)
    {
        std::cout << " Attempting Barrier " << i + 1 << std::endl;
        MPI_Barrier(MPI_COMM_WORLD);
        std::cout << " Completed Barrier " << i + 1 << std::endl;
    }

    MPI_Finalize();
}

command line, from intel_machine: mpiexec -hosts 2 localhost amd_machine -wdir "\network\path" \path-to-exe

output:

[0] Attempting Barrier 1
[1] Attempting Barrier 1
[0] Completed Barrier 1
[0] Attempting Barrier 2
[1] Completed Barrier 1
[0] Completed Barrier 2
[1] Attempting Barrier 2
[0] Attempting Barrier 3
[0] Completed Barrier 3
[0] Attempting Barrier 4
[1] Completed Barrier 2
[1] Attempting Barrier 3
[1] Completed Barrier 3
[1] Attempting Barrier 4

job aborted:
[ranks] message

[0] terminated

[1] fatal error
Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(MPI_COMM_WORLD) failed
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.  (errno 10060)

ghost avatar Jan 10 '25 15:01 ghost