wi4mpi
wi4mpi copied to clipboard
Support `MPI_Aint_add` and `MPI_Aint_diff`
Hey again,
Sorry for the multiple PRs, but I felt this one is a feature integration rather than the bug fix of my other one, thus deserving a separate PR.
This PR is meant to fix #18 by introducing support of the MPI 3.1 functions MPI_Aint_add
and MPI_Aint_diff
.
The encountered issue was that OpenMPI's implementation of those functions is a macro, making the symbols absent from the library. This means that when running with preload using --from MPICH/Intel --to OpenMPI
, the application crashes with an undefined symbol.
This is resolved by overriding the symbols and preventing the ASM router from doing its work, and rather implementing the OpenMPI macro directly.
Addition to the header
I added the definition found in the MPI3.1 standard of the two functions to interface/interface_utils/include/mpi.h
. I have no experience with fortran so I decided not to touch things on the fortran side.
Symbol contents
I implemented the overrides by making sure they matched the output of OpenMPI's mpicc
. For the following code:
#include <mpi.h>
MPI_Aint get_diff(MPI_Aint lhs, MPI_Aint rhs) {
return MPI_Aint_diff(lhs, rhs);
}
MPI_Aint get_sum(MPI_Aint lhs, MPI_Aint rhs) { return MPI_Aint_add(lhs, rhs); }
We get the following assembly, compiled using mpicc -c -g -O0
:
$ objdump --disassemble=get_diff aint.o
aint.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <get_diff>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 89 7d f8 mov %rdi,-0x8(%rbp)
8: 48 89 75 f0 mov %rsi,-0x10(%rbp)
c: 48 8b 55 f8 mov -0x8(%rbp),%rdx
10: 48 8b 45 f0 mov -0x10(%rbp),%rax
14: 48 29 c2 sub %rax,%rdx
17: 48 89 d0 mov %rdx,%rax
1a: 5d pop %rbp
1b: c3 ret
$ objdump --disassemble=get_sum aint.o
aint.o: file format elf64-x86-64
Disassembly of section .text:
000000000000001c <get_sum>:
1c: 55 push %rbp
1d: 48 89 e5 mov %rsp,%rbp
20: 48 89 7d f8 mov %rdi,-0x8(%rbp)
24: 48 89 75 f0 mov %rsi,-0x10(%rbp)
28: 48 8b 55 f0 mov -0x10(%rbp),%rdx
2c: 48 8b 45 f8 mov -0x8(%rbp),%rax
30: 48 01 d0 add %rdx,%rax
33: 5d pop %rbp
34: c3 ret
So simple additions and subtractions. This is reflected in the overrides: https://github.com/spoutn1k/wi4mpi/blob/96e458a0733c14ca80156db7ed0e39aa37fe2da1/src/common/override.c#L340-L357
When this applies
The override is enabled only when OpenMPI is used as target, not when any other combination of library is used. The override directives #define MPI_AINT_XXX_OVERRIDE
are defined in the following files:
-
interface/header/_OMPI/app_mpi.h
-
preload/header/INTEL_OMPI/app_mpi.h
-
preload/header/MPICH_OMPI/app_mpi.h
What's missing
This PR is my first real code contribution to the project, and I am unsure about certain things:
- The overrides do not map the
MPI_Aint
variables. This seemed not to be an issue experimentally, but it might be necessary if they stray further between vendors ? I could not find aaint_a2r
oraint_r2a
, so I just skipped that step. Let me know if I should look into implementing that. - The fortran headers have been left completely untouched. I have the header definitions from the standard (https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf), but no idea where to insert them.
Hmm for some reason github added my merge of master as individual commits ... Does anyone know how to fix that ?
Awesome work, as always.
But, why not putting #define MPI_AINT_XXX_OVERRIDE
into override.h
?
Thank you :)
The idea is to only override the symbols when necessary. Putting the define in override.h
would override the symbols project-wide, while this solution only overrides symbols when the target library is missing them (OpenMPI here).
That way, if another library decides to change the implementation of the symbol, it will be respected.
This PR is a good fix for this issue. After review we decided to fix this another way: we will add these functions to the generated ones. Thanks again for this work, it will serve as a base to our implementation.