flang
flang copied to clipboard
[regression] TeaLeaf_ref: intermittent crash when doing a call by a pointer to a procedure from an OpenMP parallel section
This regression was caused by some of the patches introduced for helping with CP2K compilation crashes. These patches were merged more than a year ago:
- 82683130aeebdbe6d5c70044cd79ef22e54f4f22 "Fix issue with procedure pointers", Tue Nov 26 14:54:45 2019
- 9dd4ed032b4a2eaa3c6a31aacd4cae92695c22d5 "Fix a segfault with procedure dummy arguments", Thu Nov 7 08:15:06 2019
- 4498b2b61bd7363f032474e55800180d5a7c0134 "Fix issue identified in large application", Wed Nov 6 14:06:55 2019
- bfb6bb22e3713e642f08f12325fc031680f84f0b "Support default procedure pointer initialization", Fri Oct 25 17:08:08 2019
Although these patches helped to solve one of many CP2K compilation issues, they have introduced a regression that manifests itself only in OpenMP parallel regions with the number of running OpenMP threads greater than 1. With recent introduction of OpenMP-only compilation variant in CP2K, these patches are potentially harmful to CP2K too (therefore, by fixing an compile-time issue they introduced a potential for a runtime crash).
One of the affected Fortran applications is TeaLeaf_ref (see https://github.com/UK-MAC/TeaLeaf_refhttps://github.com/UK-MAC/TeaLeaf_ref). In its pack_kernel.f90
file it has pack_all
subroutine which calls another procedure by pack_func
pointer which is set to point at a particular subroutine few lines above. Trouble is, the pack_all
subroutine itself is called from a OpenMP parallel region in the call_packing_functions
subroutine defined in the pack.f90
file. After the introduction of the patches listed above, following new element (a global static variable) started to appear in the generated LLVM-IR:
%struct.STATICS4 = type <{ [88 x i8] , i8* }>
@.STATICS4 = internal global %struct.STATICS4 <{ [88 x i8] [i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,
i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i
8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0,i8 0], i8* null }>, align 16, !dbg !125
Further on in this IR, read and write accesses to this variable are performed in the pack_all
subroutine, without any locking. During the execution, sooner or later it results in an attempt to either calling a wrong subroutine or to call to the NULL pointer which results in a segfault like this one:
$ OMP_NUM_THREADS=10 mpirun -np 3 ./tea_leaf
Tea Version 1.403
MPI Version
OpenMP Version
Task Count 3
Thread Count: 10
Tea Version 1.403
MPI Version
OpenMP Version
Task Count 3
Thread Count: 10
Output file tea.out opened. All output will go there.
[vcn-man-comp:87373] *** Process received signal ***
[vcn-man-comp:87372] *** Process received signal ***
[vcn-man-comp:87373] Signal: Segmentation fault (11)
[vcn-man-comp:87373] Signal code: Address not mapped (1)
[vcn-man-comp:87373] Failing at address: (nil)
[vcn-man-comp:87372] Signal: Segmentation fault (11)
[vcn-man-comp:87372] Signal code: Address not mapped (1)
[vcn-man-comp:87372] Failing at address: (nil)
[vcn-man-comp:87373] [ 0] [0xffff8152066c]
[vcn-man-comp:87373] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node localhost exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
This crash could be easily reproduced on AArch64 and x86_64 machines.
SOME WORKAROUND:
I've slightly modified the IR by adding the thread_local
attribute to the .STATICS4
global variable. After compiling it and linking into the final executable binary I've observed that this indeed resulted in an successful execution of the tea_leaf
with many threads.
POTENTIAL DRAWBACKS OF THIS WORKAROUND (NEEDS INVESTIGATION):
It is not certain if such approach is correct in every case. What if the pack_all
has its own OpenMP parallel region in which pack_func
is accessed? Should a change to the pack_func
pointer made in this function before entering the parallel region be propagated to all of the other threads? What if the pack_func
is listed in PRIVATE
or SHARED
clauses? Will it be handled with necessary care?
Note that TeaLeaf_ref is an MPI application. We've been testing it with OpenMPI version 4.0.5. The affected code is executed when there are more than two MPI processes started, e.g. by invoking OMP_NUM_THREADS=2 mpirun -np 3 ./tea_leaf
.
We were testing TeaLeaf_ref from the feature/armcompiler
branch (see https://github.com/UK-MAC/TeaLeaf_ref/tree/feature/armcompiler). Both AArch64 and x86_64 tests were built from this branch.
I've found some time to look into this. The following line in the pack_kernel.f90
file causes problem:
PROCEDURE(pack_or_unpack), POINTER :: pack_func => NULL()
As I mentioned in my previous comments, after introduction of the commits I've listed above, a static global variable is created for the pack_func
variable. This is clearly not safe for multi-threaded OpenMP programs and it results in a race condition. One way to avoid creation of a static variable is to rewrite this code as such:
PROCEDURE(pack_or_unpack), POINTER :: pack_func
NULLIFY(pack_func)
Other way is to make one of the changes introduced by those listed commits ineffective for OpenMP compilation mode, namely:
--- a/tools/flang1/flang1exe/semant.c
+++ b/tools/flang1/flang1exe/semant.c
@@ -11357,10 +11357,15 @@ proc_dcl_init:
sptr = refsym(sptr, OC_OTHER);
SST_SYMP(RHS(3), sptr);
SST_IDP(RHS(3), S_IDENT);
- sem.dinit_data = TRUE;
- (void)mkvarref(RHS(3), ITEM_END);
- sem.dinit_data = FALSE;
- inited = TRUE;
+ if (flg.smp) {
+ sem.dinit_data = FALSE;
+ inited = FALSE;
+ } else {
+ sem.dinit_data = TRUE;
+ (void)mkvarref(RHS(3), ITEM_END);
+ sem.dinit_data = FALSE;
+ inited = TRUE;
+ }
proc_dcl_shared:
sptr = SST_SYMG(RHS(1));
Note that this fix isn't perfect, but it seems to be good enough.