OpenCoarrays icon indicating copy to clipboard operation
OpenCoarrays copied to clipboard

Defect: Spurious coarray bounds error with -fcheck=bounds

Open nncarlson opened this issue 3 years ago • 0 comments

  • [x] I am reporting a bug others will be able to reproduce and not asking a question or requesting a new feature.

System information including:

  • OpenCoarrays Version: caf version 2.9.2-13-g235167d

  • Fortran Compiler: GNU Fortran (GCC) 11.2.0

  • C compiler used for building lib: gcc (GCC) 11.2.0

  • Installation method: make; make install

  • All flags & options passed to the installer: cmake -DCMAKE_BUILD_TYPE=Release

  • Output of uname -a: Linux thelio.indiana 5.14.18-100.fc33.x86_64 #1 SMP Fri Nov 12 17:38:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

  • MPI library being used: MPICH 3.3.2

  • Machine architecture and number of physical cores: 12 core AMD Threadripper 2920X

  • Version of CMake: 3.19.7

To help us debug your issue please explain:

What you were trying to do (and why)

I am developing code to perform a halo exchange associated with domain decomposition methods for PDEs and have encountered a problem. To debug the problem I've extracted the problem loop into a subroutine and wrapped it with a main program that initializes some structure arrays with data from a debugging data set for 4 images. The subroutine loop is performing the equivalent of an MPI_Alltoallv or MPI_Neighbor_alltoall.After calling the subroutine, the program compares the result of the subroutine with reference data to check for a successful completion. Here is the problem reproducer:

program main

  type :: index_map
    integer, allocatable :: offP_index(:)
    integer, allocatable :: recv_displs(:), recv_counts(:)
    integer, allocatable :: send_displs(:), send_image(:), send_index(:)
    integer, allocatable :: send_index_ref(:)
  end type
  type(index_map) :: imap

  select case (this_image())
  case (1)
    imap%send_image  = [2, 3, 4]
    imap%recv_counts = [3, 5, 4]
    imap%recv_displs = [0, 3, 8]
    imap%send_displs = [0, 0, 0]
    imap%offP_index = [8, 10, 12, 13, 14, 16, 18, 19, 21, 23, 25, 27]
    imap%send_index_ref = [1, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]
  case (2)
    imap%send_image  = [1, 3, 4]
    imap%recv_counts = [6, 2, 4]
    imap%recv_displs = [0, 6, 8]
    imap%send_displs = [0, 5, 4]
    imap%offP_index = [1, 2, 3, 4, 5, 6, 15, 16, 20, 21, 24, 25]
    imap%send_index_ref = [8, 10, 12, 9, 10, 11, 12, 7, 8, 9, 10, 11, 12]
  case (3)
    imap%send_image  = [1, 2, 4]
    imap%recv_counts = [5, 4, 6]
    imap%recv_displs = [0, 5, 9]
    imap%send_displs = [6, 3, 8]
    imap%offP_index = [2, 3, 4, 5, 6, 9, 10, 11, 12, 21, 23, 24, 25, 26, 27]
    imap%send_index_ref = [13, 14, 16, 18, 19, 15, 16, 13, 14, 15, 16, 17, 18, 19]
  case (4)
    imap%send_image  = [1, 2, 3]
    imap%recv_counts = [6, 6, 7]
    imap%recv_displs = [0, 6, 12]
    imap%send_displs = [11, 7, 7]
    imap%offP_index = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
    imap%send_index_ref = [21, 23, 25, 27, 20, 21, 24, 25, 21, 23, 24, 25, 26, 27]
  end select
  allocate(imap%send_index, mold=imap%send_index_ref)
  imap%send_index = 0

  call sub(imap)

  if (any(imap%send_index /= imap%send_index_ref)) error stop

contains

  subroutine sub(this)

    type(index_map), intent(inout), target :: this

    integer :: j, k
    type :: box
      integer, pointer :: array(:)
    end type
    type(box), allocatable :: buffer[:]
  
    allocate(buffer[*])
    buffer%array => this%send_index
    sync all
    do j = 1, size(this%recv_counts)
      associate (i => this%send_image(j), n => this%recv_counts(j), &
                 s0 => this%send_displs(j), r0 => this%recv_displs(j))
        do k = 1, n
          !write(*,'(i0,a,5(1x,i0))') this_image(), ': ', j, i, &
          !    lbound(buffer[i]%array), s0+k, ubound(buffer[i]%array)
          buffer[i]%array(s0+k) = this%offP_index(r0+k)
        end do
      end associate
    end do
    sync all
  end subroutine

end program

What happened (include command output, screenshots, logs, etc.)

When compiled with -fcheck=bounds the runtime reports a spurious out-of-bounds error:

At line 69 of file gfortran-20220127.f90
Fortran runtime error: Index '15' of dimension 1 of array 'buffer%array' above upper bound of 14

Error termination. Backtrace:
#0  0x402b81 in ???
#1  0x4050b5 in ???
#2  0x40545e in ???
#3  0x7f325e2981e1 in ???
#4  0x40253d in ???
#5  0xffffffffffffffff in ???
Error: Command:
   `/opt/lib/gcc-11_2/mpich/3.3.2/bin/mpiexec -n 4 ./a.out`
failed to run.

What you expected to happen

The program should run to completion without error. That there is no actual out-of-bounds reference can be confirmed by uncommenting the write statement and verifying the index in each iteration of the loop lies within the array bounds -- until it aborts on the following line. Also the NAG and Intel compilers report no out-of-bounds access.

Step-by-step reproduction instructions to reproduce the error/bug

caf -fcheck=bounds reproducer.f90
cafrun -n 4 ./a.out

nncarlson avatar Jan 27 '22 23:01 nncarlson