mpich
mpich copied to clipboard
MPIR_CVAR_CH4_OFI_ENABLE_GPU_PIPELINE=1 and MPI_File_write_at_all on Intel GPUs
Hello,
This is to report an issue we are seeing with MPICH on Intel GPUs (related to an IOR issue from @pkcoff).
If we run a code (reproducer below) which calls MPI_File_write_at_all with a GPU device buffer of large-ish sizes and the environment variable MPIR_CVAR_CH4_OFI_ENABLE_GPU_PIPELINE=1 it hangs. In particular, a message size of 100000 doesn't hang but a message size of 200000 does. This is only hanging on two or more nodes as well. (One node is not hanging.)
Thanks! Let us know if this is expected or we're doing something wrong.
Reproducer
> cat t.cpp
#include <mpi.h>
#include <math.h>
#include <stdio.h>
#include <sycl/sycl.hpp>
#include <cstring>
#define MESSAGE_SIZE 200000
int main(){
MPI_Init(NULL, NULL);
sycl::queue syclQ{sycl::gpu_selector_v };
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int numProcs;
MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
MPI_File outFile;
MPI_File_open(
MPI_COMM_WORLD, "test", MPI_MODE_CREATE | MPI_MODE_WRONLY,
MPI_INFO_NULL, &outFile);
char *bufToWrite_host = (char*)malloc(sizeof(char)*MESSAGE_SIZE);
char *bufToWrite_device = (char*)sycl::malloc_device<char>(MESSAGE_SIZE, syclQ);
char *rank_string = (char*)malloc(sizeof(char)*2);
sprintf(rank_string,"%d",rank);
for(int i=0;i<MESSAGE_SIZE;i++)
bufToWrite_host[i] = rank_string[0];
// printf("%s\n", bufToWrite_host);
syclQ.memcpy( bufToWrite_device, bufToWrite_host, sizeof(char)*MESSAGE_SIZE);
syclQ.wait();
MPI_File_write_at_all(
outFile, rank * MESSAGE_SIZE,
bufToWrite_device, MESSAGE_SIZE, MPI_CHAR, MPI_STATUS_IGNORE);
MPI_File_close(&outFile);
MPI_Finalize();
return 0;
}
> rm test # removing the output file
> mpicc -fsycl r.cpp
> MPIR_CVAR_CH4_OFI_ENABLE_GPU_PIPELINE=1 mpirun -n 2 -ppn 1 ./a.out
Expected Output
We don't expect it to hang.
Actual Output
It hangs.