mpich
mpich copied to clipboard
Using malloc_shared with MPI_File_write_at_all on Intel GPUs
Hello,
This is to report an issue we are seeing with MPICH on Intel GPUs (related to an IOR issue from @pkcoff). A small reproducer is below. The code uses Intel SYCL's malloc_shared as a buffer to send to MPI_File_write_at_all. The code works fine with regular malloc
. It also works fine on one node but crashes on 2 nodes with errors of "Abort(15) on node 1 (rank 1 in comm 496): Fatal error in internal_Issend: Other MPI error". Is it expected that we can't pass memory allocated with SYCL's malloc_shared as buffers to MPI I/O functions like MPI_File_write_at_all for multi-node jobs?
Reproducer
> cat t.cpp
#include <mpi.h>
#include <math.h>
#include <stdio.h>
#include <sycl/sycl.hpp>
int main(){
MPI_Init(NULL, NULL);
sycl::queue syclQ{sycl::gpu_selector_v };
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int numProcs;
MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
MPI_File outFile;
MPI_File_open(
MPI_COMM_WORLD, "test", MPI_MODE_CREATE | MPI_MODE_WRONLY,
MPI_INFO_NULL, &outFile);
// regular malloc like below works, malloc_shared fails
// char *bufToWrite = (char*)malloc(sizeof(char)*4);
char *bufToWrite = (char*)sycl::malloc_shared<char>(4, syclQ);
snprintf(bufToWrite, 4, "%3d", rank);
printf("%s\n", bufToWrite);
MPI_File_write_at_all(
outFile, rank * 3,
bufToWrite, 3, MPI_CHAR, MPI_STATUS_IGNORE);
MPI_File_close(&outFile);
MPI_Finalize();
}
> mpicc -fsycl t.cpp
# run on two nodes, one rank per node
> mpirun -n 2 -ppn 1 ./a.out
Expected output
It should run like:
> mpirun -n 2 -ppn 1 ./a.out
1
0
We expect it to run, since malloc_shared is accessible on the host. This works fine with 2 MPI ranks on 1 node as well.
Actual output
> mpirun -n 2 -ppn 1 ./a.out
1
cxil_map: write error
cxil_map: write error
cxil_map: write error
cxil_map: write error
cxil_map: write error
cxil_map: write error
cxil_map: write error
cxil_map: write error
cxil_map: write error
cxil_map: write error
cxil_map: write error
cxil_map: write error
Abort(15) on node 1 (rank 1 in comm 496): Fatal error in internal_Issend: Other MPI error
0
x1921c6s1b0n0.hostmgmt2000.cm.americas.sgi.com: rank 1 exited with code 15
Note that above was with the default of ZE_FLAT_DEVICE_HIERARCHY=FLAT
. If we use ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE is also fails:
> mpirun -n 2 -ppn 1 ./a.out
free(): invalid pointer
x1921c6s1b0n0.hostmgmt2000.cm.americas.sgi.com: rank 1 died from signal 6
x1921c5s5b0n0.hostmgmt2000.cm.americas.sgi.com: rank 0 died from signal 15