OpenCoarrays icon indicating copy to clipboard operation
OpenCoarrays copied to clipboard

Defect: Assignment of big > 2 GB coarrays fails when linked against Intel MPI

Open modrzejewski opened this issue 6 years ago • 7 comments

Avg response time
Issue Stats

Defect/Bug Report

  • OpenCoarrays Version: 2.3.1
  • Fortran Compiler: gfortran 6.4.0
  • C compiler used for building lib: gcc 6.4.0
  • Output of uname -a: Linux login01.pro.cyfronet.pl 3.10.0-862.14.4.el7.x86_64 #1 SMP Fri Sep 28 10:29:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • MPI library being used: Intel MPI 2018.0
  • Machine architecture and number of physical cores: 10 nodes, 2x Intel Xeon E5-2680v3 in each node

Observed Behavior

Program stops with error message

ERROR STOP MPI-error: Invalid count

Expected Behavior

           2   2.0000000000000000     
           8   8.0000000000000000     
           4   4.0000000000000000     
           5   5.0000000000000000     
           7   7.0000000000000000     
           1   55.000000000000000     
           6   6.0000000000000000     
           9   9.0000000000000000     
           3   3.0000000000000000     
          10   10.000000000000000

Steps to Reproduce

Minimal example

program bigarrays
      implicit none
      double precision, dimension(:, :), allocatable :: x[:]
      integer, parameter :: n = 20000
      integer :: k
      allocate(x(n, n)[*])
      x = dble(this_image())
      sync all
      if (this_image() == 1) then
         do k = 2, num_images()
            x(:, :) = x(:, :) + x(:, :)[k]
         end do
      end if
      sync all
      print *, this_image(), x(1, 1)
end program bigarrays

Compilation

module load plgrid/tools/gcc/6.4.0
module load plgrid/libs/opencoarrays/2.3.1
gfortran -fcoarray=lib bigarrays.f90 -lcaf_mpi

Invoke program using 10 images (SLURM script)

#!/bin/bash -l
#SBATCH --job-name="test"
## Number of nodes
#SBATCH --nodes=10
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=24
#SBATCH --mem 100000
#SBATCH --time=0:05:00 
#SBATCH -A rpa2018
#SBATCH -p plgrid-testing
#SBATCH --output="bigarrays_gfortran.log"
#SBATCH --error="bigarrays_gfortran.log"

module load plgrid/tools/python
module load plgrid/tools/gcc/6.4.0
module load plgrid/libs/opencoarrays/2.3.1

cafrun -np 10 ./a.out

modrzejewski avatar Jan 06 '19 15:01 modrzejewski

The same issue is present when OpenCoarrays is linked against OpenMPI 2.1.1.

modrzejewski avatar Jan 07 '19 16:01 modrzejewski

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Mar 29 '19 09:03 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Apr 26 '19 15:04 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar May 24 '19 17:05 stale[bot]

sigh, I still haven't had a chance to investigate. I won't mark as "in progress" though, so that stale bot keeps bugging me about this.

zbeekman avatar May 24 '19 18:05 zbeekman

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Jun 21 '19 19:06 stale[bot]

Be gone stale bot

zbeekman avatar Jun 25 '19 13:06 zbeekman