Microsoft-MPI
Microsoft-MPI copied to clipboard
Simple program fails with mixed AMD/Intel machines
Works with only Intel or only AMD nodes. With a mix, it always fails in the same way. It completes loop 3 and gets stuck in the MPI_Barrier for loop 4.
#include
int main()
{
int argc = 0;
MPI_Init(&argc, nullptr);
const int count = 100;
for (int i = 0; i < count; ++i)
{
std::cout << " Attempting Barrier " << i + 1 << std::endl;
MPI_Barrier(MPI_COMM_WORLD);
std::cout << " Completed Barrier " << i + 1 << std::endl;
}
MPI_Finalize();
}
command line, from intel_machine:
mpiexec -hosts 2 localhost amd_machine -wdir "\network\path" \path-to-exe
output:
[0] Attempting Barrier 1
[1] Attempting Barrier 1
[0] Completed Barrier 1
[0] Attempting Barrier 2
[1] Completed Barrier 1
[0] Completed Barrier 2
[1] Attempting Barrier 2
[0] Attempting Barrier 3
[0] Completed Barrier 3
[0] Attempting Barrier 4
[1] Completed Barrier 2
[1] Attempting Barrier 3
[1] Completed Barrier 3
[1] Attempting Barrier 4
job aborted:
[ranks] message
[0] terminated
[1] fatal error
Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(MPI_COMM_WORLD) failed
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (errno 10060)