ompi icon indicating copy to clipboard operation
ompi copied to clipboard

mpi_f08 configure tests failing for NAG

Open jsquyres opened this issue 3 years ago • 12 comments

Per this thread on the users mailing list, the configure tests for mpi_f08 are apparently failing incorrectly for the NAG compiler v7.2 in Open MPI v4.1.x:

$ FC=nagfor ./configure --prefix=/usr/local/openmpi-4.1.2
[ ... ]
checking if building Fortran 'use mpi_f08' bindings... no
Build MPI Fortran bindings: mpif.h, use mpi

That thread also cites this from config.log, which could be a clue into what is happening:

configure:69595: result: no
configure:69673: checking for Fortran compiler support of !$PRAGMA IGNORE_TKR
configure:69740: nagfor -c -f2008 -dusty -mismatch  conftest.f90 >&5
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7101
Evaluation trial version of NAG Fortran Compiler Release 7.1(Hanzomon)
Build 7101
Questionable: conftest.f90, line 52: Variable A set but never referenced
Warning: conftest.f90, line 52: Pointer PTR never dereferenced
Error: conftest.f90, line 39: Incorrect data type REAL (expected
CHARACTER) for argument BUFFER (no. 1) of FOO
Error: conftest.f90, line 50: Incorrect data type INTEGER (expected
CHARACTER) for argument BUFFER (no. 1) of FOO
[NAG Fortran Compiler error termination, 2 errors, 2 warnings]
configure:69740: $? = 2

Finally, a user contacted NAG tech support about this issue, and they replied with this:

Regarding OpenMPI, we have attempted the build ourselves but cannot make sense of the configure script. Only the OpenMPI maintainers can do something about that and it looks like they assume that all compilers will just swallow non-conforming Fortran code. The error downgrading options for NAG compiler remain "-dusty", "-mismatch" and "-mismatch_all" and none of them seem to help with the mpi_f08 module of OpenMPI. If there is a bug in the NAG Fortran Compiler that is responsible for this, we would love to hear about it, but at the moment we are not aware of such.

I'm labeling this as a v4.1 and v5.0 issue. I don't think it will be worthwhile to back-port relevant fixes back to v4.0.x.

FYI @ThemosTsikas @mathomp4 @wadudmiah @tkacvinsky

jsquyres avatar Dec 30 '21 22:12 jsquyres

Can we have the full (compressed) config.log?

configure tries different methods to support the ignore_tkr thing

  • as-is (e.g. no pragma/directive)
  • !GCC$ ATTRIBUTES NO_ARG_CHECK
  • !DEC$ ATTRIBUTES NO_ARG_CHECK
  • !$PRAGMA IGNORE_TKR
  • !DIR$ IGNORE_TKR
  • !IBM* IGNORE_TKR

so we should not focus on just one. The code is in the _OMPI_FORTRAN_CHECK_IGNORE_TKR() and OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB() subroutines from config/ompi_fortran_check_ignore_tkr.m4.

I think that should be enough for NAG support to determine if/how to make it work with their compiler. If needed, I can post the 6 Fortran snippets that are tried to compile.

ggouaillardet avatar Dec 31 '21 03:12 ggouaillardet

Gilles,

I'll get it to you next week when I get back to work.

Note that it's entirely possible NAG cannot support ignoring TKR. NAG is pretty strict to the Fortran Standard, so if they feel they support TYPE(), DIMENSION(..) (aka the void of Fortran) completely, they'd have no reason to need to ignore TKR.

Is there a reason Open MPI requires supporting ignoring TKR? Or perhaps, is there a way to tell it to not test for it? The MPI Standard seems to say that the Fortran 2008 interfaces for MPI procedures are all based on TYPE(*), DIMENSION(..). And if a compiler supports that, it seems like ignoring TKR is redundant at that point.

On Thu, Dec 30, 2021 at 10:51 PM Gilles Gouaillardet < @.***> wrote:

Can we have the full (compressed) config.log?

configure tries different methods to support the ignore_tkr thing

  • as-is (e.g. no pragma/directive)
  • !GCC$ ATTRIBUTES NO_ARG_CHECK
  • !DEC$ ATTRIBUTES NO_ARG_CHECK
  • !$PRAGMA IGNORE_TKR
  • !DIR$ IGNORE_TKR
  • !IBM* IGNORE_TKR

so we should not focus on just one. The code is in the _OMPI_FORTRAN_CHECK_IGNORE_TKR() and OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB() subroutines from config/ompi_fortran_check_ignore_tkr.m4.

I think that should be enough for NAG support to determine if/how to make it work with their compiler. If needed, I can post the 6 Fortran snippets that are tried to compile.

— Reply to this email directly, view it on GitHub https://github.com/open-mpi/ompi/issues/9795#issuecomment-1003258023, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWYEXTKTMSQ63VVF4USPCDUTUSC3ANCNFSM5LAL3EPA . You are receiving this because you were mentioned.Message ID: @.***>

-- Matt Thompson “The fact is, this is about us identifying what we do best and finding more ways of doing less of it better” -- Director of Better Anna Rampton

mathomp4 avatar Dec 31 '21 13:12 mathomp4

If NAG doesn't support ignoring TKR, then -- at least at the moment -- Open MPI's mpi_f08 module won't work with the NAG compiler.

Open MPI certainly can -- and should -- be extended to properly support TYPE(*), DIMENSION(..), but it hasn't been a high priority. It's been a while since I've thought about this kind of stuff, but I seem to recall that there's two possibilities for TYPE(*), DIMENSION(..) support:

  1. Have subroutine prototypes with TYPE(*), DIMENSION(..), but ignore the majority of the Fortran descriptor metadat that is passed to the subroutine and just use the buffer pointer. This would allow TYPE(*), DIMENSION(..) prototypes, but would involve the least amount of perturbation of the Open MPI code base (i.e., not have to grow full support for all the other metadata in the Fortran descriptor).
  2. Fully support Fortran descriptors, including non-contiguous buffers (i.e,. be able to set MPI_SUBARRARYS_SUPPORTED to .TRUE.). This is a non-trivial amount of work.
    • We've talked about how to do this before, and I think there were even some preliminary PRs in this direction -- see https://github.com/open-mpi/ompi/tree/1a70e5bd16aa72fa9fc10de64ef01b0c6ca03f6a/ompi/mpi/fortran/use-mpi-f08-desc as the last git tree before we removed the proof-of-concept ompi/mpi/fortran/use-mpi-f08-desc directory (it was removed in the next git commit: 791bcee6c065b47869525553265e36deb6fd0390). I think send_f08_desc.f90 and recv_f08_desc.f90 are of particular interest.

I should also note that, starting earlier this month/before the Christmas break, there is a low-but-nonzero level of ongoing activity to generate the Fortran bindings based on the MPI Forum JSON / Python bindings library. If someone is of the mind to properly support TYPE(*), DIMENSION(..), please come talk to me and @markalle before you go hand-write a whole bunch of code. That being said, note that this Fortran code generation work is intended for Open MPI 5.x and later (it'll likely miss the 5.0.x series, but could maybe potentially be part of a 5.1.x series...?) -- there's no intent to back-port it to Open MPI v4.x or earlier.

jsquyres avatar Dec 31 '21 15:12 jsquyres

Could you please attach the Fortran code that is used to determine if the compiler supports F2008? If NAG can see that it is valid Fortran, then they can investigate a bug in their compiler. The latter would not be a massive surprise as they have just released this new version (7.1) of the compiler.

wadudmiah avatar Dec 31 '21 18:12 wadudmiah

@wadudmiah There's not just one test -- there's a whole bunch of them. They are collectively used to determine whether the Fortran compiler supports "enough" F08 for Open MPI's mpi_f08 module or not. See https://github.com/open-mpi/ompi/blob/master/config/ompi_setup_mpi_fortran.m4#L411-L588 for the list of tests that all have to pass for configure to determine whether the Fortran compiler supports what is needed.

I also explained a bit more in https://www.mail-archive.com/[email protected]/msg34698.html:

I'm one of the few people in the Open MPI dev community who has a clue about Fortran, and I'm very far from being a Fortran expert. Modern Fortran is a legitimately complicated language. So it doesn't surprise me that we might have some code in our configure tests that isn't quite right.

Let's also keep in mind that the state of F2008 support varies widely across compilers and versions. The current Open MPI configure tests straddle the line of trying to find enough F2008 support in a given compiler to be sufficient for the mpi_f08 module without being so overly proscriptive as to disqualify compilers that aren't fully F2008-compliant. Frankly, the state of F2008 support across the various Fortran compilers was a mess when we wrote those configure tests; we had to cobble together a variety of complicated tests to figure out if any given compiler supported enough F2008 support for some / all of the mpi_f08 module. That's why the configure tests are... complicated.

That's why it would be useful to see the config.log file from a configure invocation with the NAG compiler -- let's see exactly which test(s) is(are) failing. Then we can figure out why.

It's also possible that since NAG doesn't support "ignore TKR" pragmas, then Open MPI's configure is correctly determining that the NAG compiler doesn't support Open MPI's mpi_f08 module (since Open MPI currently requires "ignore TKR" directives). See my above comment for more detail: https://github.com/open-mpi/ompi/issues/9795#issuecomment-1003404394

jsquyres avatar Dec 31 '21 19:12 jsquyres

Hello,

Thanks for looking into this. The config.log (edit: for OpenMPI 5.0.0rc2) is here .

It would be nice, given all the effort involved in putting these new facilities into the Fortran Standard and then implementing them in compilers, to see them actually used in the field.

ThemosTsikas avatar Jan 02 '22 11:01 ThemosTsikas

I have also logged all the invocations, sources and responses by the NAG Fortran Compiler during configure here.

You will find dbgnagind1234.txt text files that contain the compiler options, dbgnagsrc1234_5.f90 text files that contain the sources for the 5th option and dbgnagcap1234.txt text files that capture the compiler's messages.

ThemosTsikas avatar Jan 02 '22 11:01 ThemosTsikas

Thanks @ThemosTsikas. From your config.log, I think I identified 2 legitimate errors (the other errors look like cases that are supposed to fail).

  1. I don't remember the exact meaning of type(*), dimension(*) (vs. type(*), dimension(..)), but we're testing for it in config/ompi_fortran_check_ignore_tkr.m4. There was one place in there where I accidentally had type(*) instead of type(*), dimension(*) -- I think that's an error (vs. something Past Jeff did deliberately). I updated it to be type(*), dimension(*).
  2. Another place had a dummy parameter as real when I'm pretty sure it should have been complex. Fixed.

I've posted #9812 with these 2 fixes, and a tarball created from this branch here: https://aws.open-mpi.org/~jsquyres/unofficial/openmpi-gitclone-pr9812.tar.bz2. Could you give either the PR or the tarball a try?

jsquyres avatar Jan 02 '22 19:01 jsquyres

Tried again with pr9812 tarball, config.log.

ThemosTsikas avatar Jan 02 '22 20:01 ThemosTsikas

Thanks! Let me follow up with you on #9812...

jsquyres avatar Jan 02 '22 20:01 jsquyres

I note the erroneous scalar TYPE(*) instead of the standard TYPE(*),DIMENSION(*) is still in 4.1.4.

That is, the program following the "checking for Fortran compiler support of TYPE(), DIMENSION()" line (starting at line 69152 of configure) has

  interface
     subroutine foo(buffer, count)
       ! buffer
       type(*), intent(in) :: buffer
       integer, intent(in) :: count
     end subroutine foo
  end interface

i.e. note the lack of DIMENSION(*) there.

The second bug is that FORCE_ASSUMED_SHAPE has the wrong data type for its argument; there is an interface specifying COMPLEX,DIMENSION(:,:) but the procedure is defined with REAL,DIMENSION(:,:). The test program only calls this with COMPLEX arguments. Just changing (on line 69207) "real"->"complex" fixes that.

The third bug is that TYPE(*) arguments require the procedure to be referenced with an explicit interface - 15.4.2.2 item (3)(f), TYPE(*) is a kind of polymorphic (7.3.2.2 paragraph 3, first sentence). The call to FOO from FORCE_ASSUMED_SHAPE is missing that interface. This could be added to FORCE_ASSUMED_SHAPE, but frankly, deleting the interface for FORCE_ASSUMED_SHAPE in the main program and making it into an internal subprogram is the simplest fix.

Malcolm-Cohen avatar Sep 13 '22 01:09 Malcolm-Cohen

@malcom-cohen Thanks - very much looking forward to using these features with NAG compiler!

tclune avatar Sep 13 '22 13:09 tclune

I hate to revive a zombit issue, but I was wondering if support for this got into v5.0.X say? (Just because I see it in the labels).

We'd love to move to use mpi_f08 (though in fairness we have some mpif.h that needs to go first. Going bye bye in MPI 5!)

mathomp4 avatar Aug 31 '23 18:08 mathomp4

I hate to revive a zombit issue, but I was wondering if support for this got into v5.0.X say? (Just because I see it in the labels).

Just to say it, but can confirm that mpi_f08 does not build for NAG in Open MPI 5.0.1. Just looked at my build on my Mac.

mathomp4 avatar Jan 30 '24 15:01 mathomp4

Yeah, I'm sorry -- I think we did a bunch of work on it back then, but then ran out of time / resources, and not enough people were asking for mpi_f08 support with the NAG Fortran compiler. ☹️

Is there renewed interest? There's two possible approaches to fix this:

  1. Fix what is there already (i.e., fix the implementation, and then update the configure tests to match what they really need to test).
  2. Auto-generate the Fortran MPI API bindings (even if it's just generate the mpi_f08 bindings), and do them correctly.

We've talked about auto-generating for quite a while (both C and the various Fortran bindings), but haven't gotten past the do-a-bunch-of-first-steps phase because this is unfortunately a ton of work. That being said, https://github.com/open-mpi/ompi/pull/12226 is steps in the right direction -- albeit for the MPI-4 embiggening -- but it could open the door for someone who has some time to build upon it for the other bindings.

jsquyres avatar Jan 30 '24 20:01 jsquyres

The interest is not exactly renewed. It only went away because there was no response to my suggesting fixes for the invalid Fortran OpenMPI was trying to use. Neither I nor my colleague who tried to get it working before understood why OpenMPI should rely on compilers not being as good at error detection as they could be.

So, I just gave up on OpenMPI and used MPICH instead.

If you’d like me to try again with a newer OpenMPI and suggest some fixes again, I can do that.

Cheers,

--

..............Malcolm Cohen, NAG Oxford/Tokyo.

From: Jeff Squyres @.> Sent: Wednesday, January 31, 2024 5:09 AM To: open-mpi/ompi @.> Cc: Malcolm-Cohen @.>; Comment @.> Subject: Re: [open-mpi/ompi] mpi_f08 configure tests failing for NAG (Issue #9795)

Yeah, I'm sorry -- I think we did a bunch of work on it back then, but then ran out of time / resources, and not enough people were asking for mpi_f08 support with the NAG Fortran compiler. ☹️

Is there renewed interest? There's two possible approaches to fix this:

  1. Fix what is there already (i.e., fix the implementation, and then update the configure tests to match what they really need to test).
  2. Auto-generate the Fortran MPI API bindings (even if it's just generate the mpi_f08 bindings), and do them correctly.

We've talked about auto-generating for quite a while (both C and the various Fortran bindings), but haven't gotten past the do-a-bunch-of-first-steps phase because this is unfortunately a ton of work. That being said, https://github.com/open-mpi/ompi/pull/12226/files is steps in the right direction -- albeit for the MPI-4 embiggening -- but it could open the door for someone who has some time to build upon it for the other bindings.

— Reply to this email directly, view it on GitHub https://github.com/open-mpi/ompi/issues/9795#issuecomment-1917805568 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AT7XXOJXSWD77H4SR4BTPYTYRFHNTAVCNFSM5LAL3EPKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJRG44DANJVGY4A . You are receiving this because you commented. https://github.com/notifications/beacon/AT7XXOOTDDMQGCO6EY42T6DYRFHNTA5CNFSM5LAL3EPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOOJHWIAA.gif Message ID: @.*** @.***> >

Malcolm-Cohen avatar Jan 31 '24 23:01 Malcolm-Cohen