ompi
ompi copied to clipboard
ompi_fortran_check_ignore_tkr.m4: fix fortran test errors
Fix two bugs in the Fortran test code:
- Called with "type()" instead of "type(), dimension(*)"
- Subroutine type was "real" when it should have been "complex"
Signed-off-by: Jeff Squyres [email protected]
Refs #9795
@ThemosTsikas Per your most recent config.log (https://github.com/open-mpi/ompi/issues/9795#issuecomment-1003767914), I see the "use...only" test failed.
Can you tell me if our test is invalid, or if it is supposed to work in Fortran?
cat > aaa.f90 << EOF
MODULE aaa
INTEGER :: CMON(1)
COMMON/CMMON/CMON
INTEGER :: global_aaa
END MODULE aaa
EOF
nagfor -I. -mismatch -c aaa.f90
cat > bbb.f90 << EOF
MODULE bbb
integer, bind(C, name="cmmon_") :: CMON
INTEGER :: global_bbb
END MODULE bbb
EOF
nagfor -I. -mismatch -c bbb.f90
cat > conftest.f90 <<EOF
PROGRAM test
USE aaa, ONLY : global_aaa
USE bbb, ONLY : global_bbb
implicit none
END PROGRAM
EOF
nagfor -c -I. -mismatch conftest.f90
The config.log
you sent shows the following error:
configure:61877: nagfor -c -I. -mismatch conftest.f90 >&5
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7103
Warning: conftest.f90, line 5: GLOBAL_AAA explicitly imported into TEST but not used
Warning: conftest.f90, line 5: GLOBAL_BBB explicitly imported into TEST but not used
[NAG Fortran Compiler normal termination, 2 warnings]
conftest.f90:1:16: error: conflicting types for 'cmmon_'
PROGRAM test
^
conftest.f90:1:3: note: previous declaration of 'cmmon_' was here
PROGRAM test
^~~~~~
But if I compile these with gfortran 10.2.0, I get success:
$ gfortran -c aaa.f90
$ gfortran -c bbb.f90
$ gfortran -c -I. conftest.f90
$ ls -l conftest.o
-rw-r--r-- 1 jsquyres named 1792 Jan 2 12:57 conftest.o
I'm not trying to say "gfortran is right!" I'm just illustrating that I thought this test was correct because the GNU, Intel, and Absoft Fortran compilers pass this test.
I am absolutely not a Fortran expert. Can you provide some clarity?
Thank you Jeff, I am going to take it to Malcolm Cohen and it would help if I could understand what this test is trying to achieve. It doesn't seem to check that the intended semantics are respected. Would it help if I included here the Note from the Fortran Standard regarding C interoperability of global variables and common blocks?
The following are examples of the usage of the BIND attribute for variables and for a common block. The
Fortran variables, C_EXTERN and C2, interoperate with the C variables, c_extern and myVariable,
respectively. The Fortran common blocks, COM and SINGLE, interoperate with the C variables, com and single, respectively.
MODULE LINK_TO_C_VARS
USE, INTRINSIC :: ISO_C_BINDING
INTEGER(C_INT), BIND(C) :: C_EXTERN
INTEGER(C_LONG) :: C2
BIND(C, NAME=’myVariable’) :: C2
COMMON /COM/ R, S
REAL(C_FLOAT) :: R, S, T
BIND(C) :: /COM/, /SINGLE/
COMMON /SINGLE/ T
END MODULE LINK_TO_C_VARS
/* Global variables. */
int c_extern;
long myVariable;
struct { float r, s; } com;
float single;
The rules under "Interoperation with C global variables" (18.9) include
1 A C variable whose name has external linkage may interoperate with a common block or
with a variable declared in the scope of a module. The common block or variable shall be
specified to have the BIND attribute.
2 At most one variable that is associated with a particular C variable whose name has
external linkage is permitted to be declared within all the Fortran program units of a
program. A variable shall not be initially defined by more than one processor.
That second paragraph makes me suspect that this is why our compiler rejects the code but I need to look closer. For now, I need to understand what is actually meant to be achieved by this coding, as there may be a better way to express it.
The test is just trying to check that use...only
works, and only imports the names that are identified (and ignores those that are not identified -- even if there's multiple, conflicting names in of the imported modules in question).
I'm not entirely sure why I chose to use common blocks for this test; we don't use common blocks anywhere in the Open MPI Fortran code (these tests were written years ago). I suppose that I could just have something like:
MODULE aaa
INTEGER :: common1
REAL :: common2
INTEGER :: global_aaa
END MODULE aaa
MODULE bbb
INTEGER :: common1
COMPLEX :: common2
INTEGER :: global_bbb
END MODULE bbb
I.e., don't use BIND
or a common block at all. And just to spice the test up a little, have common1
that is the same type between both modules, and have common2
that is a different type between both modules.
I'm really not sure why I used BIND
+ a common block. It seems so obvious to not use those, so it feels like Past Jeff would have had a reason for doing that. ...but I can't think of what it would be.
Perhaps the way forward is to relax the test to not use binding labels or COMMON. These are "global entities" in Fortran and the following rules apply to them. I am inclined to think that your existing test violates them in the use of the name cmmon_ as can be seen by adding the line
BIND(C,NAME="cmmon_"):: /CMMON/
to module AAA (which reinforces the choice of global identifier in the absence of a binding label). One then gets an error message for conftest.f90 from the NAG compiler (but gfortran and ifort let it through).
Error: aaa.f90: Duplicate binding label 'cmmon_' for variable CMON of module BBB and COMMON/CMMON/
19.2 Global identifiers
1 Program units, common blocks, external procedures, entities with binding labels, external input/output units,
pending data transfer operations, and images are global entities of a program. The name of a common block with
no binding label, external procedure with no binding label, or program unit that is not a submodule is a global
identifier. The submodule identifier of a submodule is a global identifier. A binding label of an entity of the
program is a global identifier. An entity of the program shall not be identified by more than one binding label.
2 The global identifier of an entity shall not be the same as the global identifier of any other entity. Furthermore, a
binding label shall not be the same as the global identifier of any other global entity, ignoring differences in case.
A processor may assign a global identifier to an entity that is not specified by this document to have a global
identifier (such as an intrinsic procedure); in such a case, the processor shall ensure that this assigned global
identifier differs from all other global identifiers in the program.
@ThemosTsikas Ok, I updated this PR and pushed a new unofficial tarball made from this PR: https://aws.open-mpi.org/~jsquyres/unofficial/openmpi-gitclone-pr9812-2.tar.bz2. Could you give it a try?
I think LOC needs to be C_LOC and pointer arithmetic must be done on the C side, to comply with standard Fortran.
@ThemosTsikas Can you test if you can compile these two files together with your compiler into an executable, and running that executable produces a conftestval
file with a reasonable value (e.g., 4)?
module alignment_mod
type, BIND(C) :: test_mpi_handle
integer :: MPI_VAL
end type test_mpi_handle
type(test_mpi_handle), target :: t1
type(test_mpi_handle), target :: t2
end module alignment_mod
program falignment
use alignment_mod
external align
call align(t1, t2)
end program falignment
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
void align_(char *t1, char *t2)
{
FILE *fp = fopen("conftestval", "w");
if (!fp) exit(1);
ptrdiff_t x;
if (t1 > t2) {
x = t1 - t2;
} else {
x = t2 - t1;
}
fprintf(fp, "%d\n", (int) x);
fclose(fp);
}
@jsquyres I just caught up on all my on-leave email and:
❯ nagfor -V
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7101
Product NPMI671NA for Apple Intel Mac OSX 64-bit
Copyright 1990-2020 The Numerical Algorithms Group Ltd., Oxford, U.K.
❯ clang --version
Apple clang version 13.0.0 (clang-1300.0.29.30)
Target: x86_64-apple-darwin20.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
❯ clang -c align.c
❯ nagfor falignment.f90 align.o
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7101
[NAG Fortran Compiler normal termination]
❯ ./a.out
❯ cat conftestval
4
ETA: I get the same result with gcc-gfortran 11.2 and Intel Fortran 2022.0 with clang.
Thanks Matthew, I get the same here.
@ThemosTsikas @mathomp4 Excellent; thanks for testing. I ended up going a slightly different direction, but your test gives me hope that it'll succeed, anyway. Here's a new tarball to test: https://aws.open-mpi.org/~jsquyres/unofficial/openmpi-gitclone-pr9812-3.tar.bz2
Much better progress. configure succeeds, enabling use-mpi-f08. But make falls over because bad options were passed to the compiler. Would you like to get a trial licence for the NAG Fortran Compiler so that you can try this yourself? You will be sent a trial key automatically by email. https://www.nag.com/content/getting-started-nag-fortran-compiler
I have time this week to look into this because I'm technically on vacation; I got a key and will look at it this afternoon.
It would be best if NAG could add Open MPI to its regular testing (Absoft does this, for example). This is how vendor involvement / support typically works in the Open MPI community: those who care (i.e., the vendors themselves) provide the work. In Absoft's case, for example, the run automated testing that reports up to the Open MPI community development database so that we, the Open MPI dev community, know when we have broken something with their compiler. Absoft then helps us debug and fix the issue. I only cite Absoft because they're another Fortran compiler vendor; the same is generally true for all vendors in the Open MPI community (e.g., network and server vendors).
That is a good idea, I will look into implementing it.
@ThemosTsikas What version of the NAG compiler are you using? The version I downloaded from https://www.nag.com/content/getting-started-nag-fortran-compiler is:
$ nagfor -V
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7101
Product NPL6A71NA for x86-64 Linux
Copyright 1990-2020 The Numerical Algorithms Group Ltd., Oxford, U.K.
When I run through Open MPI's configure
, it behaves differently than you mentioned above:
- It builds the limited
mpi
module (i.e., no ignore TKR) - It does not build the
mpi_f08
module - It builds everything else just fine (i.e., doesn't error when building Open MPI)
The users mailing list specifically mentioned the NAG compiler v7.2 as being the first F2008-compliant release (https://www.mail-archive.com/[email protected]/msg34686.html).
Do I need a different version/build of NAG?
There is no 7.2, that was an error. The latest is 7.1 and is (claimed to be) F2008 compliant. You can pick up the latest bug-fixed Builds from http://monet.nag.co.uk/compiler/r71download/. The main website's Build is updated less frequently. Build 7102 should exhibit the behaviour I described.
@ThemosTsikas Ok, I got the 7102 build and got farther:
- Open MPI's
configure
succeeded and decided that it could build thempi_f08
module - When building the
mpi_f08
module, it compiled several.F90
files ok, but then errored out on one of them.
Here's the simplest example I could come up with showing (part of) what the .F90
file is doing that nagfor
does not like:
MODULE mpi_types
type, BIND(C) :: MPI_Comm
integer :: MPI_VAL
end type MPI_Comm
END MODULE mpi_types
MODULE mpi
use mpi_types
! This is one type of MPI_UNWEIGHTED
integer MPI_UNWEIGHTED(1)
common/mpi_fortran_unweighted/MPI_UNWEIGHTED
interface
subroutine PMPI_Cart_create(old_comm, ndims, dims, periods, reorder, &
comm_cart, ierror)
integer, intent(in) :: old_comm
integer, intent(in) :: ndims
integer, dimension(*), intent(in) :: dims
logical, dimension(*), intent(in) :: periods
logical, intent(in) :: reorder
integer, intent(out) :: comm_cart
integer, intent(out) :: ierror
end subroutine PMPI_Cart_create
end interface
END MODULE mpi
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
MODULE mpi_f08_types
use mpi_types
! This is the other type for MPI_UNWEIGHTED
integer, dimension(1), bind(C, name="mpi_fortran_unweighted_") :: MPI_UNWEIGHTED
END MODULE mpi_f08_types
MODULE mpi_f08_interfaces
interface MPI_Cart_create
subroutine MPI_Cart_create_f08(comm_old,ndims,dims,periods,reorder,comm_cart,ierror)
use :: mpi_f08_types, only : MPI_Comm
implicit none
TYPE(MPI_Comm), INTENT(IN) :: comm_old
INTEGER, INTENT(IN) :: ndims, dims(ndims)
LOGICAL, INTENT(IN) :: periods(ndims), reorder
TYPE(MPI_Comm), INTENT(OUT) :: comm_cart
INTEGER, OPTIONAL, INTENT(OUT) :: ierror
end subroutine MPI_Cart_create_f08
end interface MPI_Cart_create
END MODULE mpi_f08_interfaces
MODULE mpi_f08
use mpi_f08_types
use mpi_f08_interfaces
END MODULE mpi_f08
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine MPI_Cart_create_f08(comm_old,ndims,dims,periods,reorder,comm_cart,ierror)
! This appears to be where the problem arises.
! We use two modules that (ultimately) have conflicting types for
! MPI_UNWEIGHTED.
use :: mpi_f08_types, only : MPI_Comm
use :: mpi, only : PMPI_Cart_create
implicit none
TYPE(MPI_Comm), INTENT(IN) :: comm_old
INTEGER, INTENT(IN) :: ndims
INTEGER, INTENT(IN) :: dims(ndims)
LOGICAL, INTENT(IN) :: periods(ndims), reorder
TYPE(MPI_Comm), INTENT(OUT) :: comm_cart
INTEGER, OPTIONAL, INTENT(OUT) :: ierror
integer :: c_ierror
call PMPI_Cart_create(comm_old%MPI_VAL,ndims,dims,periods,&
reorder,comm_cart%MPI_VAL,c_ierror)
if (present(ierror)) ierror = c_ierror
end subroutine MPI_Cart_create_f08
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! This part is just a sample program so that I can invoke an interface
! from the mpi_f08 module.
PROGRAM test
use mpi_f08
implicit none
type(MPI_Comm) :: comm_a, comm_b
integer :: ndims
integer, dimension(3) :: dims
logical, dimension(3) :: periods
logical :: reorder
ndims = 3
dims(1) = 1
dims(2) = 2
dims(3) = 3
periods(1) = .true.
periods(2) = .true.
periods(3) = .true.
reorder = .true.
call MPI_Cart_create_f08(comm_a, ndims, dims, periods, reorder, comm_b)
END PROGRAM test
With the following .c
file that contains the implementation of PMPI_Cart_create
:
#include <stdio.h>
#pragma weak PMPI_CART_CREATE = ompi_cart_create_f
#pragma weak pmpi_cart_create = ompi_cart_create_f
#pragma weak pmpi_cart_create_ = ompi_cart_create_f
#pragma weak pmpi_cart_create__ = ompi_cart_create_f
#pragma weak PMPI_Cart_create_f = ompi_cart_create_f
#pragma weak PMPI_Cart_create_f08 = ompi_cart_create_f
typedef int MPI_Fint;
typedef int ompi_fortran_logical_t;
void ompi_cart_create_f(MPI_Fint *old_comm, MPI_Fint *ndims, MPI_Fint *dims,
ompi_fortran_logical_t *periods, ompi_fortran_logical_t *reorder,
MPI_Fint *comm_cart, MPI_Fint *ierr)
{
printf("In ompi_cart_create_f\n");
}
I can compile+link the above Fortran and C with gfortran 10.2 and ifort 19.0.4:
$ gcc pmpi_cart_create.c -c
$ gfortran modules.f90 -c
$ gfortran modules.o pmpi_cart_create.o
$ ./a.out
In ompi_cart_create_f
$ ifort modules.f90 -c
$ ifort modules.o pmpi_cart_create.o
$ ./a.out
In ompi_cart_create_f
But nagfor
apparently doesn't like that there are conflicting types for MPI_UNWEIGHTED
in the mpi
and the mpi_f08_types
modules:
$ nagfor modules.f90 -c
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7102
Evaluation trial version of NAG Fortran Compiler Release 7.1(Hanzomon) Build 7102
[NAG Fortran Compiler normal termination]
modules.f90:30:37: error: conflicting types for ‘mpi_fortran_unweighted_’
MODULE mpi_f08_types
^
modules.f90:1:3: note: previous declaration of ‘mpi_fortran_unweighted_’ was here
MODULE mpi_types
^
The error message appears to be incorrect: mpi_fortran_unweighted_
does not appear in the mpi_types
module.
The error message also does not tell exactly where the conflict occurred; I suspect that it's in subroutine MPI_Cart_create_f08
where we import both mpi
and mpi_f08
. But we use the ONLY
qualifier, so we should only be importing 1 name each from those modules, and those shouldn't conflict.
I know it's convoluted, but is that valid Fortran?
(I really hope so, because we need this...)
You notice the line? [NAG Fortran Compiler normal termination]
if errors happen subsequent to that, they are errors by the underlying C compiler or linker. In this case it is the C compiler. This is a nagfor bug as we should never generate bad C. I will log it for fixing. I haven’t analysed it yet so it might turn out to be a Fortran error to do this but it is hard to catch it earlier. Let me have a look at it.
Sorry, I have just woken up here. Isn’t this the same issue we had earlier at the configure stage, the one about global identifiers?
Isn't the way forward to not repeat global identifiers but reuse them, like so:
$ diff modules.f90 mymodules.f90
33c33,34
< integer, dimension(1), bind(C, name="mpi_fortran_unweighted_") :: MPI_UNWEIGHTED
---
> ! integer, dimension(1), bind(C, name="mpi_fortran_unweighted_") :: MPI_UNWEIGHTED
> use mpi, only:mpi_unweighted
And to show the decoupling of names better:
$ diff modules.f90 mymodules.f90
12,13c12,13
< common/mpi_fortran_unweighted/MPI_UNWEIGHTED
<
---
> common/anynameyoulike/MPI_UNWEIGHTED
> Bind(C,Name="mpi_fortran_unweighted_")/anynameyoulike/
33c33
< integer, dimension(1), bind(C, name="mpi_fortran_unweighted_") :: MPI_UNWEIGHTED
---
> use mpi, only:mpi_unweighted
You know, I made a statement to you recently that we didn't use common blocks anywhere in OMPI code recently; oops -- I guess that was wrong. ☹️
But this is an interesting point: I just tested, and gcc 4.8.5 (which is as far back as Open MPI v5.0.x supports) supports:
use, intrinsic :: iso_c_binding
integer, dimension(1), bind(C, name="foo") :: MPI_UNWEIGHTED
So perhaps we should ditch all of our common blocks and replace them with BIND(C). The common blocks were only so that we could compare (in C code) to know when users passed in sentinel Fortran constants (i.e., the constants themselves, not just equivalent constant values). Seems like this functionality is exactly what BIND(C) is for.
Let me go run with that...
Fingers crossed here.
I spent a bunch of time on this and was unable to bring it to completion. ☹️
I have pushed the latest commits that I have on this branch, but I stress that they do not work yet. I only pushed here so that the work was not accidentally lost.
We have a somewhat complicated scheme for trying to share code between the mpif.h
, mpi
module, and mpi_f08
module (we make several modules that get "use"d by the mpi
and mpi_f08
modules -- but there's a bit of a complicated dependency graph between all the modules and sub-modules and sub-sub-modules that get used). The interactions between all these modules is creating errors with the nagfor compiler (which appear to be legit errors; I'm not sure how gfortran/ifort/etc. make it all work; it may be happy accidents?). Part of the problem is that we are intentionally faking out the Fortran compiler, too (see https://github.com/open-mpi/ompi/blob/master/ompi/mpi/fortran/use-mpi-f08/bindings/mpi-f-interfaces-bind.h#L24-L163).
I think that I am coming to the conclusion that this is too complicated, and should be simplified somehow. I'm not entirely sure what the Right way is to do that, and my window of availability to work on this just got drastically reduced.
Here's the current problems with the commits as they currently are on this PR:
- There's a ton of new common segment symbols. When building Open MPI is done, the common-symbol-checker runs and finds the sentinel values in nearly all the F08 module files. For example:
info_dup_f08.o: 0000000000000001 C mpi_fortran_argv_null
info_dup_f08.o: 0000000000000001 C mpi_fortran_argvs_null
info_dup_f08.o: 0000000000000004 C mpi_fortran_bottom
info_dup_f08.o: 0000000000000004 C mpi_fortran_errcodes_ignore
info_dup_f08.o: 0000000000000004 C mpi_fortran_in_place
info_dup_f08.o: 0000000000000018 C mpi_fortran_status_ignore
info_dup_f08.o: 0000000000000018 C mpi_fortran_statuses_ignore
info_dup_f08.o: 0000000000000004 C mpi_fortran_unweighted
info_dup_f08.o: 0000000000000004 C mpi_fortran_weights_empty
- When compiling
ring_usempif08.f90
:
$ mpifort ring_usempif08.f90
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7102
Evaluation trial version of NAG Fortran Compiler Release 7.1(Hanzomon) Build 7102
Error: ring_usempif08.f90, line 53: Symbol MPI_STATUS_IGNORE found both in module MPI_F08_SENTINELS and in MPI
detected at MPI_STATUS_IGNORE@)
That is a correct error message: MPI_STATUS_IGNORE
is a constant in both the mpi
module and the mpi_f08
module, and those constants have different types in those two modules. Given that both the mpi
and mpi_f08
modules can be used in the same compilation unit, how can we have 2 different versions of the sentinel value MPI_STATUS_IGNORE (and MPI_STATUSES_IGNORE) that have different back-end symbols?
That being said, ring_usempif08.f90
is only using the mpi_f08
module, so there must be some kind of complicated dependency between the mpi_f08
and mpi
in the commits as they currently exist on this PR (i.e., mpi_f08
is somehow pulling in mpi
).
The IBM CI (GNU/Scale) build failed! Please review the log, linked below.
Gist: https://gist.github.com/1d1543d7bdb4ae684e8714ce5c0c8a6f
@hppritcha and I chatted on the phone about the MPI_STATUS[ES]_IGNORE issue: it seems like we can just BIND(C)
the mpi
MPI_STATUS[ES]_IGNORE to a different back-end symbol name than the mpi_f08
MPI_STATUS[ES]_IGNORE back-end symbol name, and that should solve any potential linker issues.
Still need to dig into the nagfor error from above -- there's likely some unintentional using of both mpi_f08
and mpi
.