wi4mpi icon indicating copy to clipboard operation
wi4mpi copied to clipboard

Support `MPI_Aint_add` and `MPI_Aint_diff`

Open spoutn1k opened this issue 2 years ago • 4 comments

Hey again,

Sorry for the multiple PRs, but I felt this one is a feature integration rather than the bug fix of my other one, thus deserving a separate PR.

This PR is meant to fix #18 by introducing support of the MPI 3.1 functions MPI_Aint_add and MPI_Aint_diff.

The encountered issue was that OpenMPI's implementation of those functions is a macro, making the symbols absent from the library. This means that when running with preload using --from MPICH/Intel --to OpenMPI, the application crashes with an undefined symbol.

This is resolved by overriding the symbols and preventing the ASM router from doing its work, and rather implementing the OpenMPI macro directly.

Addition to the header

I added the definition found in the MPI3.1 standard of the two functions to interface/interface_utils/include/mpi.h. I have no experience with fortran so I decided not to touch things on the fortran side.

Symbol contents

I implemented the overrides by making sure they matched the output of OpenMPI's mpicc. For the following code:

#include <mpi.h>

MPI_Aint get_diff(MPI_Aint lhs, MPI_Aint rhs) {
  return MPI_Aint_diff(lhs, rhs);
}

MPI_Aint get_sum(MPI_Aint lhs, MPI_Aint rhs) { return MPI_Aint_add(lhs, rhs); }

We get the following assembly, compiled using mpicc -c -g -O0:

$ objdump --disassemble=get_diff aint.o 

aint.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <get_diff>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
   8:	48 89 75 f0          	mov    %rsi,-0x10(%rbp)
   c:	48 8b 55 f8          	mov    -0x8(%rbp),%rdx
  10:	48 8b 45 f0          	mov    -0x10(%rbp),%rax
  14:	48 29 c2             	sub    %rax,%rdx
  17:	48 89 d0             	mov    %rdx,%rax
  1a:	5d                   	pop    %rbp
  1b:	c3                   	ret

$ objdump --disassemble=get_sum aint.o 

aint.o:     file format elf64-x86-64


Disassembly of section .text:

000000000000001c <get_sum>:
  1c:	55                   	push   %rbp
  1d:	48 89 e5             	mov    %rsp,%rbp
  20:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
  24:	48 89 75 f0          	mov    %rsi,-0x10(%rbp)
  28:	48 8b 55 f0          	mov    -0x10(%rbp),%rdx
  2c:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  30:	48 01 d0             	add    %rdx,%rax
  33:	5d                   	pop    %rbp
  34:	c3                   	ret

So simple additions and subtractions. This is reflected in the overrides: https://github.com/spoutn1k/wi4mpi/blob/96e458a0733c14ca80156db7ed0e39aa37fe2da1/src/common/override.c#L340-L357

When this applies

The override is enabled only when OpenMPI is used as target, not when any other combination of library is used. The override directives #define MPI_AINT_XXX_OVERRIDE are defined in the following files:

  • interface/header/_OMPI/app_mpi.h
  • preload/header/INTEL_OMPI/app_mpi.h
  • preload/header/MPICH_OMPI/app_mpi.h

What's missing

This PR is my first real code contribution to the project, and I am unsure about certain things:

  • The overrides do not map the MPI_Aint variables. This seemed not to be an issue experimentally, but it might be necessary if they stray further between vendors ? I could not find a aint_a2r or aint_r2a, so I just skipped that step. Let me know if I should look into implementing that.
  • The fortran headers have been left completely untouched. I have the header definitions from the standard (https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf), but no idea where to insert them.

spoutn1k avatar Dec 08 '22 00:12 spoutn1k

Hmm for some reason github added my merge of master as individual commits ... Does anyone know how to fix that ?

spoutn1k avatar Dec 08 '22 00:12 spoutn1k

Awesome work, as always. But, why not putting #define MPI_AINT_XXX_OVERRIDE into override.h ?

adrien-cotte avatar Dec 08 '22 07:12 adrien-cotte

Thank you :)

The idea is to only override the symbols when necessary. Putting the define in override.h would override the symbols project-wide, while this solution only overrides symbols when the target library is missing them (OpenMPI here).

That way, if another library decides to change the implementation of the symbol, it will be respected.

spoutn1k avatar Dec 08 '22 17:12 spoutn1k

This PR is a good fix for this issue. After review we decided to fix this another way: we will add these functions to the generated ones. Thanks again for this work, it will serve as a base to our implementation.

ducrotv avatar Sep 12 '23 13:09 ducrotv