ompi icon indicating copy to clipboard operation
ompi copied to clipboard

Possibly incorrect F08 test in ./configure script

Open cparrott73 opened this issue 2 years ago • 6 comments

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

We observed this while testing Open MPI 3.1.5 with the NVIDIA HPC SDK compilers, but we also see the same behavior in the current 4.1.5 release.

(Yes, we do understand that 3.1.5 is obsolete, but I have been told that we have an app that depends on this version for performance reasons until some additional functionality is implemented in UCX.)

Describe how Open MPI was installed

Downloaded source tarball. Set path to pick up NVIDIA HPC SDK compilers, then ran:

CC=nvc FC=nvfortran ./configure

Please describe the system on which you are running

This problem is independent of any particular operating system version and CPU type.


Details of the problem

This problem may be tangentially related to https://github.com/open-mpi/ompi/issues/9795.

In the recent 23.3 release nvfortran, our compiler developers have added support fortype (*), dimension (*) declarations in Fortran. Where previously (release 23.1 and before), the following test would fail with nvfortran in ./configure:

 !
 ! Autoconf puts "program main" at the top
 
   interface
      subroutine force_assumed_shape(a, count)
      integer :: count
      complex, dimension(:,:) :: a
      end subroutine force_assumed_shape
   end interface
 
   interface
      subroutine foo(buffer, count)
        !GCC$ ATTRIBUTES NO_ARG_CHECK :: buffer
        type(*), dimension(*), intent(in) :: buffer
        integer, intent(in) :: count
      end subroutine foo
   end interface
 
 ! Simple interface with an un-typed first argument (e.g., a choice buffer)
   integer :: count
   real :: buffer1(3)
   character :: buffer2
   complex :: buffer3(4,4)
   complex, pointer, dimension(:,:) :: ptr
   target :: buffer3
   integer :: buffer4
   ptr => buffer3
 
 ! Set some known values (somewhat irrelevant for this test, but just be
 ! sure that the values are initialized)
   a = 17
   buffer1(1) = 4.5
   buffer1(2) = 6.7
   buffer1(3) = 8.9
   buffer2 = 'a'
 
 ! Call with one type for the first argument
   call foo(buffer1, count)
 ! Call with a different type for the first argument
   call foo(buffer2, count)
 ! Force us through an assumed shape
   call force_assumed_shape(buffer3, count)
 ! Force a pointer call through an assumed shape (!)
   ptr => buffer3
 ! Also try with a simple scalar integer
 ! (Intel 2016 compiler suite only partially supports GCC pragmas;
 ! they work with all the above buffer types, but fail with a
 ! simple scalar integer)
   call foo(buffer4, count)
 
   end program
 
   subroutine force_assumed_shape(a, count)
     integer :: count
     real, dimension(:,:) :: a
     call foo(a, count)
   end subroutine force_assumed_shape
 
 ! Autoconf puts "end" after the last line
   subroutine bogus
 
       end
.

This test now succeeds with nvfortran, causing ./configure to incorrectly assume that nvfortran supports the !GCC$ ATTRIBUTES NO_ARG_CHECK directive.

Before, ./configure would correctly fail down to the !DIR$ IGNORE_TKR case:

checking for Fortran compiler support of !GCC$ ATTRIBUTES NO_ARG_CHECK… no
checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK… no
checking for Fortran compiler support of !$PRAGMA IGNORE_TKR… no
checking for Fortran compiler support of !DIR$ IGNORE_TKR… yes
checking Fortran compiler ignore TKR syntax… 1:real, dimension(*):!DIR$ IGNORE_TKR

Now the !GCC$ ATTRIBUTES NO_ARG_CHECK test succeeds, causing ./configure to incorrectly assume that nvfortran supports this directive.

This, in turn, causes Open MPI to generate incorrect Fortran interfaces with nvfortran that use !GCC$ ATTRIBUTES NO_ARG_CHECK :: rather than the standard !DIR$ IGNORE_TKR directive.

According to our Fortran language expert:

Their test, which has the statement type( * ), dimension( * ), intent(in) :: buffer , should not produce an error no matter the directive used. But they incorrectly use the !GCC$ directive when an error is not produced. The Fortran standard says an error should not occur. Here is the citation: Fortran 2018 Standard, 15.5.2.4 Ordinary dummy variables (paragraph 14), we have the following: If the actual argument is a noncoindexed scalar, the corresponding dummy argument shall be scalar unless • the actual argument is default character, of type character with the C character kind (18.2.2), or is an element or substring of an element of an array that is not an assumed-shape, pointer, or polymorphic array, • the dummy argument has assumed-rank, or • the dummy argument is an assumed-type assumed-size array. The third bullet applies. Assumed type is type( * ) and assumed size is dimension( * ). In other words, type shall be ignored and one can either pass a rank 1 array or scalar argument to type( * ), dimension( * ), intent(in) :: buffer.

Thus, he believes this test should pass for all standards-compliant Fortran compilers, and it should not assume that the GCC directive is needed here.

Could you take a look at this ./configure test and consider if the logic here is correct?

Thanks in advance.

cparrott73 avatar Apr 11 '23 17:04 cparrott73

  1. There might be a typo in config/ompi_fortran_check_ignore_tkr.m4:
    # Vendor-neutral, TYPE(*) syntax
    OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(
         [!], [type(*)],
         [TYPE(*), DIMENSION(*)],
         [happy=1], [happy=0])

should it be

    # Vendor-neutral, TYPE(*) syntax
    OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(
         [!], [type(*), DIMENSION(*)],
         [TYPE(*), DIMENSION(*)],
         [happy=1], [happy=0])

instead?

Or should we (first?) try what is mandated by the standard"

    # Vendor-neutral, TYPE(*) syntax
    OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(
         [!], [type(*), DIMENSION(..)],
         [TYPE(*), DIMENSION(..)],
         [happy=1], [happy=0])
  1. about nvfortran behavior
     subroutine foo(buffer, count)
       !DIR$ IGNORE_TKR buffer
       type(*), dimension(*), intent(in) :: buffer
       integer, intent(in) :: count
     end subroutine foo

compiles just fine (and I guess this is the expected behavior)

however

     subroutine foo(buffer, count)
       type(*), dimension(*), intent(in) :: buffer
       integer, intent(in) :: count
     end subroutine foo

and

     subroutine foo(buffer, count)
       !GCC$ ATTRIBUTES NO_ARG_CHECK ::buffer
       type(*), dimension(*), intent(in) :: buffer
       integer, intent(in) :: count
     end subroutine foo

both compile correctly but issue the same warning

$ nvfortran -c conftest.f90 
NVFORTRAN-W-0189-Argument number 1 to foo: association of scalar actual argument to array dummy argument (conftest.f90: 50)
  0 inform,   1 warnings,   0 severes, 0 fatal for main
$ echo $?
0

Despite the warning, I was able to successfully build Open MPI and some simple test program. so

  • is Open MPI really incorrectly built because of the !GCC$ ATTRIBUTES NO_ARG_CHECK directive?
  • is the !DIR$ IGNORE_TKR directive mandatory?
  • if yes, should nvfortran simply fails instead of issuing a warning? otherwise, is there a way to force nvfortran to fail in this case?

That being said, there is no warning when using type(*), dimension(..) instead of type(*), dimension(*), so don't we want to use the former since this is what is specified in the standard?

ggouaillardet avatar Apr 14 '23 06:04 ggouaillardet

Thanks for the feedback. It's not the GCC directive in and of itself that causes the issue - I think our compiler just ignores the directive itself. However, the problem is due to the fact that type(*), dimension(*) is present in the generated interface associated with this directive. Open MPI itself will build successfully this way, but it cannot compile any MPI Fortran program - it will yield errors to the effect of "cannot resolve generic procedure" because of this declaration.

Using type(*), dimension(..) instead should be okay, according to our Fortran expert.

cparrott73 avatar Apr 14 '23 16:04 cparrott73

Will this be in the next release-candidate, then?

cponder avatar Apr 26 '23 02:04 cponder

@jsquyres @bwbarrett could you please discuss this issue during the next telcon?

I proposed #11591 in order to address this, but it breaks on IBM platforms (and I am unable to investigate this without access to the IBM environment). If we fix that later (e.g. 5.0.1 time frame) that could be considered as breaking the ABI in the middle of a release, so we could:

  • do nothing until Open MPI 6
  • add an adhoc configure flag to force the use of type(*), dimension(..) for Open MPI 5.

ggouaillardet avatar Apr 29 '23 15:04 ggouaillardet

@jsquyres AWS is also bitten by this. Do we have a consensus on the fix?

cc @ggouaillardet

wenduwan avatar Dec 08 '23 17:12 wenduwan

FYI: more discussion is occurring on #11591

jsquyres avatar Jul 10 '24 16:07 jsquyres

We are being hit by this issue on our site (Juelich Supercomputing Centre) as well. In the past (with OpenMPI 4.1.6 and NVHPC 23.X) we could workaround it backporting https://github.com/open-mpi/ompi/commit/5b4ee66322fdee73297f73319f831a1960f81aa6 That is not working now with NVHPC 24/25 and OpenMPI 5.0.5, not even applying https://github.com/open-mpi/ompi/pull/11591#issuecomment-2227340974

To be honest after reading this thread and https://github.com/open-mpi/ompi/pull/11591 I am not sure what is the best path forward for us. As I understand it a definite fix might make it into OpenMPI 6, and using LLVM 19.X-based compilers might also work. But none of these options allow us to move forward right now. The #11591 PR seem promising, but it is unclear (to me at least) if it affects the ABI (so a drop-in replacement of a patched OpenMPI is risky), and it is also not clear if it actually fixes it. And it seems like progress here stalled a little bit.

Does anyone have any insight or suggestion? Thanks!

damianam avatar Jan 31 '25 14:01 damianam