ompi
ompi copied to clipboard
Fortran: first try if type(*), dimension(..) works as-is
Refs. open-mpi/ompi#11582
:bot:ibm:retest
- fair point, I updated the PR accordingly
- the thing is some
gfortranversions pass theconfigurytest withTYPE(*), DIMENSION(..), but fail to build Open MPI because even ifASYNCHRONOUSalone works fine,TYPE (*), DIMENSION(..), ASYNCHRONOUSis not supported (!), and to make things worse, we testignore_tkrbeforeASYNCHRONOUSsince the former is required byuse mpibut the latter is only necessary foruse mpi_f08. I will welcome any ideas on how to better test this feature.
I re-scanned the mpi 4.0 standard here. Disregard my comment about contiguous, and I see your point about asynchronous.
That is surprising to me that DIMENSION(..) is allowed but not with ASYNCHRONOUS. I wonder why.
After thinking about it some more, I do have a concern that we are changing many interfaces to include ASYNCHRONOUS, when the standard only suggests the non-blocking routines include the keyword. My gut tells me this won't cause any compilation problems, but could potentially cause a performance regression as it ties the compilers hands during optimization somewhat. It would be good to ensure we don't introduce a regression with this change.
Better yet, perhaps we should introduce a separate ignore_trk_f08_async, and use this macro rather than independently adding ignore_trk and OMPI_ASYNCHRONOUS.
After thinking about it some more, I do have a concern that we are changing many interfaces to include ASYNCHRONOUS
ASYNCHRONOUS is only used when testing TYPE(*), DIMENSION(..), but is not added to the definitions of the MPI subroutines that do not have it.
ASYNCHRONOUSis only used when testing
Ah, I missed that. Thank you.
@jjhursey could you please (have someone) have a look at the CI issue on IBM platforms?
- it seems the
use_mpitest timeout with GNU compiler - there is a linking error with IBM compilers, that could indicate something is missing when creating the library. I do not know what exactly nor how to detect and prevent that.
@awlauria @gpaulsen I don't have cycles to look at the CI failure.
Let's retest to get the logs back. bot:ibm:retest
bot:ibm:retest
bot:ibm:retest
@ggouaillardet This PR came up again in the last few days. I took the liberty of rebasing it to the tip of main (since it had some conflicts).
Here's what my configure found, on a Mac M2 with gfortran 14 and Fedora 38 with gfortran 14:
checking for Fortran compiler support of TYPE(*), DIMENSION(..)... yes
checking Fortran compiler ignore TKR syntax... 1:type(*), DIMENSION(..):!
And then Open MPI built and installed fine, and built all the example programs correctly.
However, I notice that if I try to run example_usempi or example_usempif08, it hangs here:
$ mpirun --oversubscribe -np 4 ring_usempif08
Process 0 sending 10 to 1 tag 201 ( 4 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decremented value: 6
Process 0 decremented value: 5
Process 0 decremented value: 4
Process 0 decremented value: 3
Process 0 decremented value: 2
Process 0 decremented value: 1
Process 0 decremented value: 0
Process 0 exiting
ring_c and ring_mpifh work fine.
I ran out of time before being able to investigate further.
It's a bit weird that it does a bunch of sends and receives seemingly correctly (i.e., the ring value decrements properly) and then hangs at the end.
Any idea what's happening here?
FYI @edgargabriel
I think I see what is going on.
in ompi/mpi/fortran/use-mpi-f08/bindings/mpi-f-interfaces.h we (oversimplify) declare
subroutine ompi_send_f(buf) BIND(C, name="ompi_send_f")
OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf
end subroutine ompi_send_f
so when the type for buf is type(*), DIMENSION(..) I suspect the buf passed to ompi_send_f is a CFI_desc_t but the subroutine still expects a buffer address.
In the case of ring_usempif08.f90, it means the messages are sent/received from/to the wrong addresses, and rank 1 believes it received 0 instead of 10 and leaves immediately,
I moved the PR to draft from now, and I will try to think of a simple fix.
Mm; right. I was thinking that the first thing in the descriptor was the pointer to the value, and therefore it would be transparent to OMPI's code (i.e., we would just de-reference, and would get the value, just like we did before descriptors). But I guess that's wrong: OMPI will need to double de-reference.
Meaning: accepting type(*), dimension(..) would be a gigantic change. Right?
That will be quite a change, but might not be that big via automation.
I have a proof of concept in mind I will investigate from tomorrow.
Also, that will be a first step towards supporting Fortran subarrays.
I will have to revisit this issue with the latest Nvidia compilers.
As long as the mpif-h bindings are supported, we need to use the pragmas for the use mpi module.
And if the pragmas do not work, then we cannot build use mpi, we currently do not even try to build use mpi_f08.
Currently, we (generally, it depends on the compiler) use the pragmas for both use mpi and use mpi_f08.
Though the standard does not specify how to make use mpi work, the type of the buffers is TYPE(*), DIMENSION(..) for use mpi_f08.
So unless I missed something, we are doing it wrong.
I can appreciate that was done when Fortran 2008 bindings were partially supported by most Fortran compilers, but I guess we are past that. Since fixing it will break the ABI, I would suggest we only build use mpi_f08 bindings if TYPE(*), DIMENSION(..) is supported by the Fortran compiler, and hard code the right type everywhere under use-mpi-f08.
@jsquyres any thoughts?
It's true that -- at least before Fortran compilers widely supported type(*), dimension(..) -- we could claim correct support for mpi_f08, because there's language in the spec about what you can do if compilers didn't support all the language features. That being said, Fortran compilers that support type(*), dimension(..) are much more widespread nowadays. It's probably true that Open MPI has a (mostly) functional mpi_f08 module, but it isn't technically adherent to the standard any more.
We have known that this was coming for a long time, but there's never been enough priority and resources to fix it, unfortunately.
That being said, I'm pretty convinced that the way to fix it is to generate all 3 of the Fortran bindings (mpif.h, mpi, and mpi_f08). The Fortran bindings are pretty formulaic -- much more so than the C bindings, and this should be a pretty do-able (albeit large) task. Paired with the Python library out of the MPI Forum that is used to maintain the meta data of the bindings themselves (or something like it -- even if it's just some kind of machine-readable index listing file of the bindings), we should be able to generate both the Fortran declarations and the corresponding implementations.
There have been multiple attempts at doing this over the past few years, but none were completed.
The most recent attempt was out of Los Alamos. If memory serves, binding generation was a byproduct of embiggening / "big count". But that may still be the best path forward.
True: we'll need to have some discussions about ABI. It might even be simplest to take a rip-the-band-aid-off type of solution -- just do a whole new mpi_f08 module and not worry about ABI. We could even support both the new and old mpi_f08 implementations at the same time (e.g., give a configure-type option as to which to build, and ultimately sunset / remove the old mpi_f08 implementation after a few releases).
Thanks @jsquyres.
Could you please discuss this during the next call and provide some directions? Is Open MPI 6 scheduled ? Will it support the new "standard" MPI ABI? if not, do we already know whether we will break the ABI with Open MPI 5?
A few points for ya -- feel free to ask for more data (I forget that your timezone is inconvenient to attend the OMPI engineering webexes, etc.):
- OMPI 6 is aimed at end-of-year(ish).
- Here's the current planned feature list: https://github.com/open-mpi/ompi/wiki/6.0.x-Feature-List
- One of the Big Features is Bigcount (see what I did there?). I just checked in / refreshed my brain with @hppritcha at LANL about the embiggening work that @jtronge is doing: https://github.com/open-mpi/ompi/pull/12226 is where the work is ongoing.
- Just to be clear: the bindings generation aspect of that work is (almost) a side effect; the focus is bigcount.
- For example: I don't think
type(*), dimension(..)is currently planned, but with the generation infrastructure that will be there, I think it should be able to be expanded to do so.
- There is no "standard" ABI yet -- there have been a bunch of drafts, but nothing has passed the Forum yet. Hence, we paused work on ABI and focused on embiggening (and all of its ripple effects throughout the code base).
- We had a bunch of discussions and planning about the Forum-sponsored ABI, however.
- The plan was to support both the Forum ABI and the OMPI ABI (so that we don't screw our existing customers).
- The short version is that the intent is -- for the C bindings, for example -- to have 2 different libraries: one for the Forum ABI and one for the OMPI ABI. The Forum ABI library would probably do a little swizzling of the Forum data structures to the back-end OMPI data structures and then invoke the back end
ompi_BLAH()functionality.
@ggouaillardet Given that I don't think we'll be able to have a quick-fix for supporting type(*), dimension(..), is there a different approach possible to https://github.com/open-mpi/ompi/issues/11582 that might work for v5.0.x (and/or v4.1.x)?
it seems the root cause was a false positive, i will investigate how to harden the test.
incorrect info here. Jake's big count fortran side will include the assumed rank changes - #12226. I think this PR can be closed.
incorrect info here. Jake's big count fortran side will include the assumed rank changes - https://github.com/open-mpi/ompi/pull/12226.
That's good news!
Will it include all Fortran APIs, or just the ones with bigcount?
I think this PR can be closed.
If there's a solution for the issue described in https://github.com/open-mpi/ompi/issues/11582 for v5.0.x (and potentially v4.1.x), this is worth keeping open.
Believe it or not, that did the trick!
I will make some more testing and issue a new PR this week
diff --git a/config/ompi_fortran_check_ignore_tkr.m4 b/config/ompi_fortran_check_ignore_tkr.m4
index 3686bca82e..ac23f42b91 100644
--- a/config/ompi_fortran_check_ignore_tkr.m4
+++ b/config/ompi_fortran_check_ignore_tkr.m4
@@ -200,12 +200,12 @@ AC_DEFUN([OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB], [
end subroutine force_assumed_shape
module check_ignore_tkr
- interface
- subroutine foobar(buffer, count)
+ interface foobar
+ subroutine foobar_x(buffer, count)
$1 buffer
$2, intent(in) :: buffer
integer, intent(in) :: count
- end subroutine foobar
+ end subroutine foobar_x
end interface
end module
To give some info from flang-new using ggouallardet:topic/nvfortran_ignore_tkr, it produces
OMPI_FORTRAN_F08_PREDECL='!'
OMPI_FORTRAN_F08_TYPE='type(*), dimension(..)'
OMPI_FORTRAN_IGNORE_TKR_PREDECL='!DIR$ IGNORE_TKR'
OMPI_FORTRAN_IGNORE_TKR_TYPE='type(*)'
This leads to the mpi_f08 interfaces using type(*) with !DIR$ IGNORE_TKR
interface
subroutine mpi_allreduce_f08(sendbuf,recvbuf,count,datatype,op,comm,ierror)
use mpi_f08_types,only:mpi_comm
use mpi_f08_types,only:mpi_datatype
use mpi_f08_types,only:mpi_op
type(*),intent(in)::sendbuf
!dir$ ignore_tkr(tkrdm) sendbuf
type(*)::recvbuf
!dir$ ignore_tkr(tkrdm) recvbuf
integer(4),intent(in)::count
type(mpi_datatype),intent(in)::datatype
type(mpi_op),intent(in)::op
type(mpi_comm),intent(in)::comm
integer(4),intent(out),optional::ierror
end
end interface
I don't know if the intention is that OMPI_FORTRAN_F08_TYPE be used when it does work, but that doesn't seem to be the behavior. So despite seeming to support assumed rank, that interface is not used for mpi_f08.
Here are the outputs for the checks
configure:50970: checking for Fortran compiler support of TYPE(*), DIMENSION(..)
configure:51054: flang-new -c conftest.f90 >&5
./conftest.f90:7:17: warning: The global subprogram 'force_assumed_shape' is not compatible with its local procedure declaration (incompatible dummy argument #1: incompatible dummy data object types: REAL(4) vs COMPLEX(4))
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:56:14: Declaration of 'force_assumed_shape'
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:7:17: Declaration of 'force_assumed_shape'
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:14:17: warning: The global subprogram 'foo' is not compatible with its local procedure declaration (incompatible procedure attributes: ImplicitInterface)
subroutine foo(buffer, count)
^^^
./conftest.f90:59:10: Declaration of 'foo'
call foo(a, count)
^^^
./conftest.f90:14:17: Declaration of 'foo'
subroutine foo(buffer, count)
^^^
configure:49118: checking for Fortran compiler support of TYPE(*), DIMENSION(*)
configure:49202: flang-new -c conftest.f90 >&5
error: Semantic errors in conftest.f90
./conftest.f90:7:17: warning: The global subprogram 'force_assumed_shape' is not compatible with its local procedure declaration (incompatible dummy argument #1: incompatible dummy data object types: REAL(4) vs COMPLEX(4))
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:56:14: Declaration of 'force_assumed_shape'
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:7:17: Declaration of 'force_assumed_shape'
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:14:17: warning: The global subprogram 'foo' is not compatible with its local procedure declaration (incompatible procedure attributes: ImplicitInterface)
subroutine foo(buffer, count)
^^^
./conftest.f90:59:10: Declaration of 'foo'
call foo(a, count)
^^^
./conftest.f90:14:17: Declaration of 'foo'
subroutine foo(buffer, count)
^^^
./conftest.f90:41:12: error: Rank of dummy argument is 0, but actual argument has rank 1
call foo(buffer1, count)
^^^^^^^
configure:49236: checking for Fortran compiler support of !GCC$ ATTRIBUTES NO_ARG_CHECK
configure:49320: flang-new -c conftest.f90 >&5
error: Semantic errors in conftest.f90
./conftest.f90:7:17: warning: The global subprogram 'force_assumed_shape' is not compatible with its local procedure declaration (incompatible dummy argument #1: incompatible dummy data object types: REAL(4) vs COMPLEX(4))
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:56:14: Declaration of 'force_assumed_shape'
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:7:17: Declaration of 'force_assumed_shape'
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:14:17: warning: The global subprogram 'foo' is not compatible with its local procedure declaration (incompatible procedure attributes: ImplicitInterface)
subroutine foo(buffer, count)
^^^
./conftest.f90:59:10: Declaration of 'foo'
call foo(a, count)
^^^
./conftest.f90:14:17: Declaration of 'foo'
subroutine foo(buffer, count)
^^^
./conftest.f90:77:5: error: No specific subroutine of generic 'foobar' matches the actual arguments
call foobar(var(1,1,1), 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^
configure:49354: checking for Fortran compiler support of !DIR$ IGNORE_TKR
configure:49438: flang-new -c conftest.f90 >&5
./conftest.f90:7:17: warning: The global subprogram 'force_assumed_shape' is not compatible with its local procedure declaration (incompatible dummy argument #1: incompatible dummy data object types: REAL(4) vs COMPLEX(4))
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:56:14: Declaration of 'force_assumed_shape'
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:7:17: Declaration of 'force_assumed_shape'
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:14:17: warning: The global subprogram 'foo' is not compatible with its local procedure declaration (incompatible procedure attributes: ImplicitInterface)
subroutine foo(buffer, count)
^^^
./conftest.f90:59:10: Declaration of 'foo'
call foo(a, count)
^^^
./conftest.f90:14:17: Declaration of 'foo'
subroutine foo(buffer, count)
^^^
Backporting the LLVM !$DIR IGNORE_TKR check would be helpful for flang-new regardless.
@bcornille This is the message I get with LLVM 18.1.8
configure:49354: checking for Fortran compiler support of !DIR$ IGNORE_TKR
configure:49438: flang-new -c conftest.f90 >&5
./conftest.f90:7:17: warning: The global subprogram 'force_assumed_shape' is not compatible with its local procedure declaration (incompatible dummy argument #1: incompatible dummy data object types: REAL(4) vs COMPLEX(4))
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:56:14: Declaration of 'force_assumed_shape'
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:7:17: Declaration of 'force_assumed_shape'
subroutine force_assumed_shape(a, count)
^^^^^^^^^^^^^^^^^^^
./conftest.f90:14:17: warning: The global subprogram 'foo' is not compatible with its local procedure declaration (incompatible procedure attributes: ImplicitInterface)
subroutine foo(buffer, count)
^^^
./conftest.f90:59:10: Declaration of 'foo'
call foo(a, count)
^^^
./conftest.f90:14:17: Declaration of 'foo'
subroutine foo(buffer, count)
^^^
error: loc("/home/users/u0001043/build/ompi-llvm-18.1.8/conftest.f90":41:3): /tmp/llvm-project-18.1.8.src/flang/lib/Lower/CallInterface.cpp:949: not yet implemented: support for polymorphic types
LLVM ERROR: aborting
flang-new clearly states this requires a feature that is not yet implemented.
(same behavior with main branch and #12681, both are unable to build the mpi_f08 bindings.
But if I used LLVM main branch (the release/19.x branch should be shortly), both main and #12681 successfully build the mpi_f08 bindings with the !DIR$ IGNORE_TKR directive.
I am not sure how to interpret your previous message. I think this PR will be dropped (wrong approach that would require way too many changes and break the ABI) in favor of #12681. As far as flang-new is concerned, Open MPI will only support LLVM 19 and higher. Does that make sense to you?
Superseded by #12681