Possibly incorrect F08 test in ./configure script
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
We observed this while testing Open MPI 3.1.5 with the NVIDIA HPC SDK compilers, but we also see the same behavior in the current 4.1.5 release.
(Yes, we do understand that 3.1.5 is obsolete, but I have been told that we have an app that depends on this version for performance reasons until some additional functionality is implemented in UCX.)
Describe how Open MPI was installed
Downloaded source tarball. Set path to pick up NVIDIA HPC SDK compilers, then ran:
CC=nvc FC=nvfortran ./configure
Please describe the system on which you are running
This problem is independent of any particular operating system version and CPU type.
Details of the problem
This problem may be tangentially related to https://github.com/open-mpi/ompi/issues/9795.
In the recent 23.3 release nvfortran, our compiler developers have added support fortype (*), dimension (*) declarations in Fortran. Where previously (release 23.1 and before), the following test would fail with nvfortran in ./configure:
!
! Autoconf puts "program main" at the top
interface
subroutine force_assumed_shape(a, count)
integer :: count
complex, dimension(:,:) :: a
end subroutine force_assumed_shape
end interface
interface
subroutine foo(buffer, count)
!GCC$ ATTRIBUTES NO_ARG_CHECK :: buffer
type(*), dimension(*), intent(in) :: buffer
integer, intent(in) :: count
end subroutine foo
end interface
! Simple interface with an un-typed first argument (e.g., a choice buffer)
integer :: count
real :: buffer1(3)
character :: buffer2
complex :: buffer3(4,4)
complex, pointer, dimension(:,:) :: ptr
target :: buffer3
integer :: buffer4
ptr => buffer3
! Set some known values (somewhat irrelevant for this test, but just be
! sure that the values are initialized)
a = 17
buffer1(1) = 4.5
buffer1(2) = 6.7
buffer1(3) = 8.9
buffer2 = 'a'
! Call with one type for the first argument
call foo(buffer1, count)
! Call with a different type for the first argument
call foo(buffer2, count)
! Force us through an assumed shape
call force_assumed_shape(buffer3, count)
! Force a pointer call through an assumed shape (!)
ptr => buffer3
! Also try with a simple scalar integer
! (Intel 2016 compiler suite only partially supports GCC pragmas;
! they work with all the above buffer types, but fail with a
! simple scalar integer)
call foo(buffer4, count)
end program
subroutine force_assumed_shape(a, count)
integer :: count
real, dimension(:,:) :: a
call foo(a, count)
end subroutine force_assumed_shape
! Autoconf puts "end" after the last line
subroutine bogus
end
.
This test now succeeds with nvfortran, causing ./configure to incorrectly assume that nvfortran supports the !GCC$ ATTRIBUTES NO_ARG_CHECK directive.
Before, ./configure would correctly fail down to the !DIR$ IGNORE_TKR case:
checking for Fortran compiler support of !GCC$ ATTRIBUTES NO_ARG_CHECK… no
checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK… no
checking for Fortran compiler support of !$PRAGMA IGNORE_TKR… no
checking for Fortran compiler support of !DIR$ IGNORE_TKR… yes
checking Fortran compiler ignore TKR syntax… 1:real, dimension(*):!DIR$ IGNORE_TKR
Now the !GCC$ ATTRIBUTES NO_ARG_CHECK test succeeds, causing ./configure to incorrectly assume that nvfortran supports this directive.
This, in turn, causes Open MPI to generate incorrect Fortran interfaces with nvfortran that use !GCC$ ATTRIBUTES NO_ARG_CHECK :: rather than the standard !DIR$ IGNORE_TKR directive.
According to our Fortran language expert:
Their test, which has the statement type( * ), dimension( * ), intent(in) :: buffer , should not produce an error no matter the directive used. But they incorrectly use the !GCC$ directive when an error is not produced. The Fortran standard says an error should not occur. Here is the citation: Fortran 2018 Standard, 15.5.2.4 Ordinary dummy variables (paragraph 14), we have the following: If the actual argument is a noncoindexed scalar, the corresponding dummy argument shall be scalar unless • the actual argument is default character, of type character with the C character kind (18.2.2), or is an element or substring of an element of an array that is not an assumed-shape, pointer, or polymorphic array, • the dummy argument has assumed-rank, or • the dummy argument is an assumed-type assumed-size array. The third bullet applies. Assumed type is type( * ) and assumed size is dimension( * ). In other words, type shall be ignored and one can either pass a rank 1 array or scalar argument to type( * ), dimension( * ), intent(in) :: buffer.
Thus, he believes this test should pass for all standards-compliant Fortran compilers, and it should not assume that the GCC directive is needed here.
Could you take a look at this ./configure test and consider if the logic here is correct?
Thanks in advance.
- There might be a typo in
config/ompi_fortran_check_ignore_tkr.m4:
# Vendor-neutral, TYPE(*) syntax
OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(
[!], [type(*)],
[TYPE(*), DIMENSION(*)],
[happy=1], [happy=0])
should it be
# Vendor-neutral, TYPE(*) syntax
OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(
[!], [type(*), DIMENSION(*)],
[TYPE(*), DIMENSION(*)],
[happy=1], [happy=0])
instead?
Or should we (first?) try what is mandated by the standard"
# Vendor-neutral, TYPE(*) syntax
OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB(
[!], [type(*), DIMENSION(..)],
[TYPE(*), DIMENSION(..)],
[happy=1], [happy=0])
- about
nvfortranbehavior
subroutine foo(buffer, count)
!DIR$ IGNORE_TKR buffer
type(*), dimension(*), intent(in) :: buffer
integer, intent(in) :: count
end subroutine foo
compiles just fine (and I guess this is the expected behavior)
however
subroutine foo(buffer, count)
type(*), dimension(*), intent(in) :: buffer
integer, intent(in) :: count
end subroutine foo
and
subroutine foo(buffer, count)
!GCC$ ATTRIBUTES NO_ARG_CHECK ::buffer
type(*), dimension(*), intent(in) :: buffer
integer, intent(in) :: count
end subroutine foo
both compile correctly but issue the same warning
$ nvfortran -c conftest.f90
NVFORTRAN-W-0189-Argument number 1 to foo: association of scalar actual argument to array dummy argument (conftest.f90: 50)
0 inform, 1 warnings, 0 severes, 0 fatal for main
$ echo $?
0
Despite the warning, I was able to successfully build Open MPI and some simple test program. so
- is Open MPI really incorrectly built because of the
!GCC$ ATTRIBUTES NO_ARG_CHECKdirective? - is the
!DIR$ IGNORE_TKRdirective mandatory? - if yes, should
nvfortransimply fails instead of issuing a warning? otherwise, is there a way to forcenvfortranto fail in this case?
That being said, there is no warning when using
type(*), dimension(..) instead of type(*), dimension(*), so don't we want to use the former since this is what is specified in the standard?
Thanks for the feedback. It's not the GCC directive in and of itself that causes the issue - I think our compiler just ignores the directive itself. However, the problem is due to the fact that type(*), dimension(*) is present in the generated interface associated with this directive. Open MPI itself will build successfully this way, but it cannot compile any MPI Fortran program - it will yield errors to the effect of "cannot resolve generic procedure" because of this declaration.
Using type(*), dimension(..) instead should be okay, according to our Fortran expert.
Will this be in the next release-candidate, then?
@jsquyres @bwbarrett could you please discuss this issue during the next telcon?
I proposed #11591 in order to address this, but it breaks on IBM platforms (and I am unable to investigate this without access to the IBM environment). If we fix that later (e.g. 5.0.1 time frame) that could be considered as breaking the ABI in the middle of a release, so we could:
- do nothing until Open MPI 6
- add an adhoc
configureflag to force the use oftype(*), dimension(..)for Open MPI 5.
@jsquyres AWS is also bitten by this. Do we have a consensus on the fix?
cc @ggouaillardet
FYI: more discussion is occurring on #11591
We are being hit by this issue on our site (Juelich Supercomputing Centre) as well. In the past (with OpenMPI 4.1.6 and NVHPC 23.X) we could workaround it backporting https://github.com/open-mpi/ompi/commit/5b4ee66322fdee73297f73319f831a1960f81aa6 That is not working now with NVHPC 24/25 and OpenMPI 5.0.5, not even applying https://github.com/open-mpi/ompi/pull/11591#issuecomment-2227340974
To be honest after reading this thread and https://github.com/open-mpi/ompi/pull/11591 I am not sure what is the best path forward for us. As I understand it a definite fix might make it into OpenMPI 6, and using LLVM 19.X-based compilers might also work. But none of these options allow us to move forward right now. The #11591 PR seem promising, but it is unclear (to me at least) if it affects the ABI (so a drop-in replacement of a patched OpenMPI is risky), and it is also not clear if it actually fixes it. And it seems like progress here stalled a little bit.
Does anyone have any insight or suggestion? Thanks!