OpenCoarrays
OpenCoarrays copied to clipboard
Defect: src/tests/unit/simple/test1Caf.F90 AKA increment_my_neighbor fails
Defect/Bug Report
src/tests/unit/simple/test1Caf.F90 AKA increment_my_neighbor fails (at least when oversubscribed @ 32 cores)
I have spent some time looking at this and can't convince myself that this is not a logic error in the test itself. So it could be a bug in the test or a bug in the library, likely due to a race condition, if that's the case.
- OpenCoarrays Version: 1.9.0-5-g232d234
- Fortran Compiler: GFortran 7.1
- C compiler used for building lib: GCC 7.1
- Installation method:
FC=gfortran-7 CC=gcc-7 cmake ..
- Output of
uname -a
:Darwin IBBs-MBP.local 14.5.0 Darwin Kernel Version 14.5.0: Tue Apr 11 16:12:42 PDT 2017; root:xnu-2782.50.9.2.3~1/RELEASE_X86_64 x86_64
- MPI library being used: MPICH 3.2
- Machine architecture and number of physical cores: Intel_64 @ 4 cores
- Version of CMake: 3.8.2
Observed Behavior
Test fails when oversubscribed at 32 images
Expected Behavior
Test passes
Steps to Reproduce
Uncomment relevant line in CMakeLists.txt, L568 currently.
I have noticed several tests failing when over prescribed. Failing in the sense of running forever and I manually terminated. This could be because the test make no sense at higher numbers of processes. I will post what I am seeing later.
Some tests require a specific number such as power of two
I have been running some tests on this case. With -np 4 I can get a fail about every 100 runs or so. The fail rate goes up with increased -np.
I can eliminate failures by inserting a 'sync all' here: me = this_image() np = num_images()
sync all
left = merge(np,me-1,me==1) right = merge(1,me+1,me==np)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Building with (MPICH 3.3, GCC/GFortran 8.3 all from mac Homebrew):
export FC="$(which gfortran-8)"
export CC="$(which gcc-8)"
cmake -Wdev -DCMAKE_BUILD_TYPE:STRING=Debug -DCMAKE_Fortran_FLAGS:STRING="-g -fbacktrace -fcheck=bounds,pointer" -DCMAKE_C_FLAGS:STRING="-g -fstack-check" ..
make -j
and then testing with:
bin/cafrun -np 50 bin/OpenCoarrays-2.6.1-11-g84ea96a-tests/increment_my_neighbor
Reliably causes failures on my work iMac
intel core i-5 4690 @ 3.5 GHz, 4 cores, 4 threads.
I have attached a full debug log of the runtime failure. increment_my_neighbor.50img.fail.txt