Code crash loading partitioned mesh with some number of mesh partitions
While loading gmsh mesh file with partitioned mesh:
- when some number of mesh partitions is used, the mesh loading will crash.
- with other number of mesh partitions, the mesh loading works fine.
- this is confirmed with the
Cometcode loading several different mesh files and partitions: some work fine, some crashed. - it is also confirmed when loading the same mesh and partition file using
ptn_loadingunit test: the crash behavior is the same as above.
I will include a test mesh file separately.
Stack trace from the core dump file using ptn_loading, which is very similar from stack trace generated with Comet code.
#0 0x000014beee11f6c8 in PMPI_Irecv () from /opt/cray/pe/lib64/libmpi_gnu_123.so.12
#1 0x000014bef05c511a in MPI_Irecv (buf=0x4271e544, count=592923, datatype=-1946157051, source=<optimized out>, tag=<optimized out>, comm=<optimized out>, request=<optimized out>) at darshan-apmpi.c:842
#2 0x0000000000881fdc in pumipic::ParticleBalancer::ParticleBalancer(pumipic::Mesh&) ()
#3 0x000000000083a77e in pumipic::Mesh::constructPICPart(Omega_h::Mesh&, std::shared_ptr<Omega_h::Comm>, Omega_h::Read<int>, Omega_h::Write<int>, Omega_h::Write<int>, bool) ()
#4 0x000000000083cb1c in pumipic::Mesh::Mesh(Omega_h::Mesh&, Omega_h::Read<int>, int, int) ()
#5 0x00000000004279f5 in main ()
Job submission scripts on Polaris using 4 mesh partitions:
mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth \
--env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./set_affinity_gpu_polaris.sh \
./ptn_loading 2d_cylinder.msh 2d_cylinder_4.ptn 1 3
Test mesh is in this repository, where Polaris run scripts are also included:
- with 4 mesh partitions, the unit test was running fine;
- with 8 mesh partitions, the unit test crashed.
Hello, thank you for reporting this. Is this issue observed only in Polaris, or have you encountered it in other environments as well?
@Sichao25: Hi Sichao, this is only observed on Polaris due to the large mesh size. I tried to reproduce this issue on my CentOS workstation, but the GPU memory is too small to accommodate the large mesh size.
@Sichao25: Quick update: I rebuild PUMIPic on CPU and on a CentOS desktop computer, and was able to run the test case above fine using either 4 or 8 mesh partitions. So it seems this issue is only happening on GPU.
Update: I tried several different meshes and number of partitions, all the partitions that are previously crashing on Polaris GPU are working fine on CentOS with CPU build.
I tried this on SCOREC and haven't been able to reproduce the error with CUDA so far. This is likely specific to the environment.
@Sichao25:
- So with both 4 mesh partitions and 8 mesh partitions, the case was running fine, correct?
- If yes, then this may suggest the issue is specific to Polaris, but not CPU vs GPU, since Polaris is using a different MPI,
cray-mpich/8.1.32, where on CentOS I am usingmpich 3.4.3.
Yes, both cases run without error on SCOREC with ptn_loading.
I reproduced the bug on Polaris. The issue appears to be related to MPI_Irecv failing when using a strided datatype created by MPI_Type_vector(core_nents, 1, nbuffers, MPI_INT, &bufferStride). Using non-strided data can avoid the problem, but this workaround may affect performance given the mesh size.
Since I does not encounter the same issue in other environments, it may be specific to the Cray-MPICH on Polaris. Alternative MPI implementations may resolve the issue, but I have not tested them yet.
@Sichao25: thank you for the update Sichao!
- Could you elaborate on the issue here a little bit, with referencing to the specific code section?
- When you say that
non-strided data can avoid the problem, could you push this workaround to thePUMIPicrepository for me to test its performance impact? - And finally, do you think if this issue is a bug of
Cray-MPICH? Or is this simply some missing environmental variables that we should set onPolariswhen usingCray-MPICH? In either case, it would be beneficial to raise this issue toALCFand ask for their help.
What do you guys think? @cwsmith @jacobmerson
For the record, @onkarsahni suggested looking at cray-mpich docs and variables controlling buffer sizes:
This seems old but check Slide/Page 10: http://www.archer.ac.uk/training/courses/craytools/pdf/mpi-variables.pdf
@cwsmith, @Sichao25, @jacobmerson: copying email response here for better record:
- I tried to adjust the environmental variable to several larger values in page 10:
MPICH_GNI_NUM_BUFS, and the code still crashed. - When Sichao said about
Strideddata type inMPI_Irecv, do you see if we are still missing something here with above environmental variables, or some other environmental variables need to be set?
@Sichao25: thank you for the workaroud:
- with your new branch, the partitioned mesh creation is working fine.
- with the workaround, there is some performance drop in
mesh initialization(including PICparts creation, etc.); since this is done only once at the beginning of the simulation, it may be acceptable. - The results below are for the 2D cylinder mesh with 16 mesh partitions, and a total number of triangle elements
~4.74 million. - I think it might make sense to see if this is a
cray-mpichissue or missing environmental variable issue.
non-strided data type
initialization mesh 8.94713 5 8.94604 0 8.94693 1
strided data type
initialization mesh 8.15834 8 8.15709 0 8.15804 1
Thanks for the feedback, the performance drop is expected. Ideally, we should make MPI_Irecv work with MPI_Type_Vector on Polaris. We could probably ask the ALCF team for suggestions, since no similar issues have been observed on other machines.