Incorrrectly Master-workshare nesting
Found in code evaluation with CP2K. Code hung with latest ifx compiler. Caused by nested OpenMP regions that doesn't allow by standard. Code path contain multiple source files, so only short snippets here.
- Parallel section creation dbcsr_finalize in dbscr_work_operations.F
!$OMP PARALLEL DEFAULT (NONE) &
!$OMP SHARED (matrix, old_row_p, old_col_i, old_blk_p,&
!$OMP start_offset, sort_data)
CALL dbcsr_merge_all(matrix, &
old_row_p, old_col_i, old_blk_p, &
sort_data=sort_data)
!$OMP END PARALLEL
- Create Master section dbcsr_merge_all in dbscr_work_operations.F
!$OMP MASTER
CALL dbcsr_clearfrom_index_array(matrix, dbcsr_slot_row_p)
CALL dbcsr_clearfrom_index_array(matrix, dbcsr_slot_col_i)
CALL dbcsr_clearfrom_index_array(matrix, dbcsr_slot_blk_p)
CALL dbcsr_addto_index_array(matrix, dbcsr_slot_row_p, &
DATA=new_row_p(1:nrows + 1), extra=new_nblks*2)
CALL dbcsr_addto_index_array(matrix, dbcsr_slot_col_i, &
DATA=new_col_i(1:new_nblks))
IF (sort_data) THEN
CALL dbcsr_addto_index_array(matrix, dbcsr_slot_blk_p, &
DATA=new_blk_p_sorted)
ELSE
CALL dbcsr_addto_index_array(matrix, dbcsr_slot_blk_p, &
DATA=new_blk_p)
END IF
matrix%nblks = new_nblks
matrix%nze = new_nze
matrix%index(dbcsr_slot_nblks) = matrix%nblks
matrix%index(dbcsr_slot_nze) = matrix%nze
CALL dbcsr_repoint_index(matrix)
!$OMP END MASTER
- Transit dbcsr_addto_index_array in dbcsr_index_operations.F
CALL ensure_array_size(matrix%index, lb=1, ub=ub_new, &
factor=default_resize_factor, &
nocopy=.FALSE., &
memory_type=matrix%index_memory_type)
- Transit ensure_array_size in dbcsr_ptr_util.F
IF (ub_orig - lb_orig + 1 .gt. 0) THEN
!newarray(lb_orig:ub_orig) = array(lb_orig:ub_orig)
CALL mem_copy_${nametype1}$ (newarray(lb_orig:ub_orig), &
array(lb_orig:ub_orig), ub_orig - lb_orig + 1)
END IF
- Create workshare section mem_copy_${nametype1}$ in dbcsr_ptr_util.F
SUBROUTINE mem_copy_${nametype1}$ (dst, src, n)
!! Copies memory area
INTEGER, INTENT(IN) :: n
!! length of copy
${type1}$, DIMENSION(1:n), INTENT(OUT) :: dst
!! destination memory
${type1}$, DIMENSION(1:n), INTENT(IN) :: src
!! source memory
#if !defined(__DBCSR_DISABLE_WORKSHARE)
!$OMP PARALLEL WORKSHARE DEFAULT(none) SHARED(dst,src)
#endif
dst(:) = src(:)
#if !defined(__DBCSR_DISABLE_WORKSHARE)
!$OMP END PARALLEL WORKSHARE
#endif
END SUBROUTINE mem_copy_${nametype1}$
Situation when __DBCSR_DISABLE_WORKSHARE macros is not defined violated OpenMP standard that says:
" DO, SECTIONS, SINGLE, and WORKSHARE directives are not permitted in the dynamic extent of CRITICAL, ORDERED, and MASTER directives"
Also please note that MASTER construct is deprecated in latest spec.
Fortran pretty old spec: https://www.openmp.org/wp-content/uploads/fspec20.pdf
@dmishura Awesome - thank you, Dmitry!
I explored an approach to use CP2K's convention checker and essentially ban nested parallelism or more specifically a parallel region inside of a master section. However, the GFortran AST cannot be combined with a call graph query unless much additional work is invested. Alternative approach is to rely on an assertion.
Also please note that MASTER construct is deprecated in latest spec.
This questions the whole MPI-like programming model in OpenMP based on get_get_thread_num. This is about "OpenMP tasks everywhere" to enable composable parallelism. Nice dream.
( That's too much to ask for ;-)
The current thesis is, nested parallelism like PARALLEL WORKSHARE even inside inside of a MASTER section (which in turn is inside of another PARALLEL region) should never hang. GNU Fortran as well as IFORT seems to confirm this at runtime. However, GNU, IFORT, and IFX produce an error message at compile-time if the nest is not "firewalled" by a subroutine but literally nested (second code-path when writing #if 0):
PROGRAM NESTED
IMPLICIT NONE
INTEGER :: dst(24), src(24), i
DO i = 1, SIZE(src)
src(i) = i - 1
END DO
!$OMP PARALLEL
!$OMP MASTER
#if 1
CALL workshare(dst, src)
#else
!$OMP PARALLEL WORKSHARE
dst(:) = src(:)
!$OMP END PARALLEL WORKSHARE
#endif
!$OMP END MASTER
!$OMP END PARALLEL
DO i = 1, SIZE(dst)
WRITE(*,*) dst(i)
END DO
CONTAINS
SUBROUTINE workshare(dst, src)
INTEGER, INTENT(OUT) :: dst(:)
INTEGER, INTENT(IN) :: src(:)
!$OMP PARALLEL WORKSHARE
dst(:) = src(:)
!$OMP END PARALLEL WORKSHARE
END SUBROUTINE
END PROGRAM
The confusion in compiler's implementation seems to stem from PARALLEL WORKSHARE vs. just WORKSHARE, with the latter indeed incorrect inside of a master/single section.
Nice investigation @hfp !
I would say we can remove all WORKSHARE section, we have memory-boud anyway...
I will work on DBCSR starting this week...
Also please note that MASTER construct is deprecated in latest spec.
Fortran pretty old spec: https://www.openmp.org/wp-content/uploads/fspec20.pdf
Thanks for pointing this out.
I would not change MASTER to the new MASKED construct, at least not with the next release...
I will open a specific issue as a reminder.