M2 icon indicating copy to clipboard operation
M2 copied to clipboard

MPI (message passing interface) --- run several M2 processes in parallel

Open antonleykin opened this issue 4 years ago • 18 comments

We want to implement coarse parallelization via basic MPI routines --- collaborators are welcome. See the commentary in packages/MPI.m2

To implement:

  • [x] Basic message-passing functions in the kernel
  • [x] Additional message-passing functions in the kernel
  • [ ] A package facilitating the use of the functions above

antonleykin avatar Jun 03 '21 13:06 antonleykin

  • Here's the result of running the three MPI examples on Github Actions, built using CMake on macOS, and linked with MPICH installed via Homebrew.

  • example for normalToricRing(Ideal,Thing) fails, I'm not sure why.

  • M2-binary compiles on Ubuntu, but crashed when building tvalues.m2

    • might have something to do with Ubuntu's MPICH vs Homebrew's

mahrud avatar Apr 30 '23 19:04 mahrud

@DanGrayson and @mahrud, I pushed a commit that has a change that makes it possible to run M2 both with mpirun and without. It builds for me with autotools and cmake on my Ubuntu --- see M2/BUILD/MPI/Makefile but I'm not sure how alter the "actions" to check everything

antonleykin avatar May 01 '23 14:05 antonleykin

I think you should wrap things like #include <mpi.h> with a macro that checks if MPI is being used.

(sorry about the close/reopen, that was a mis-click)

mahrud avatar May 01 '23 17:05 mahrud

I think you should wrap things like #include <mpi.h> with a macro that checks if MPI is being used.

We probably should just require MPI at build time. The basic functionality works and (unless it breaks anything anywhere in the non-MPI part) we should distribute the MPI-capable binaries in the next release.

The only current "fail" I see looks unrelated to MPI: https://github.com/Macaulay2/M2/actions/runs/4853189961/jobs/8649075617?pr=2129#step:9:3871

antonleykin avatar May 01 '23 18:05 antonleykin

Even if you want the standard distribution to be MPI-capable (which is going to be difficult with brew and potentially also debian, since typically MPI-capable binaries are distributed in separate packages), there should be a way to disable MPI, similar to TBB, OpenMP, Python, etc.

mahrud avatar May 01 '23 19:05 mahrud

Even if you want the standard distribution to be MPI-capable (which is going to be difficult with brew and potentially also debian, since typically MPI-capable binaries are distributed in separate packages), there should be a way to disable MPI, similar to TBB, OpenMP, Python, etc.

Fair enough. What would be the best way to omit MPI code with macros?

antonleykin avatar May 01 '23 19:05 antonleykin

Inside main() there are several blocks wrapped in #if WITH_PYTHON ... #endif or #if PROFILING ... #endif, etc. I think a similar block like WITH_MPI in a few places would work.

Also, once MPI is a package and has its own tests, could you revert https://github.com/Macaulay2/M2/pull/2129/commits/f1aa4e4f6187f4f295417eb5815b3d97eb67f0f1?

mahrud avatar May 01 '23 20:05 mahrud

@mahrud, I've encapsulated all MPI-related code with macros, so ubuntu non-MPI builds go through... but I don't understand why the MPI build fails now.

@DanGrayson, could you glance at M2/configure.ac? The autotools non-MPI ubuntu build works... but I'm not sure how to complete this so that MPI works as well.

antonleykin avatar May 03 '23 19:05 antonleykin

Tabs aren't allowed in yaml.

mahrud avatar May 04 '23 15:05 mahrud

@mahrud, I've encapsulated all MPI-related code with macros, so ubuntu non-MPI builds go through... but I don't understand why the MPI build fails now.

I'll need some help with cmake here... since cmake fails both on ubuntu and macOS in a similar fashion: the object file coming from d/M2lib.c is missing symbols "_MPI..."

Could it be that WITH_MPI is somehow undefined when d/M2lib.c is compiled?

@DanGrayson, could you glance at M2/configure.ac? The autotools non-MPI ubuntu build works... but I'm not sure how to complete this so that MPI works as well.

All seems fine now with the MPI build via autotools --- the problem above doesn't emerge.

antonleykin avatar May 04 '23 20:05 antonleykin

Could it be that WITH_MPI is somehow undefined when d/M2lib.c is compiled?

Yes, sorry I should have added this when I added the WITH_MPI option.

I also cherry-picked a fix for Normaliz 3.10, which is on development (it might be helpful to rebase on top of development and force push).

mahrud avatar May 05 '23 03:05 mahrud

Everything is fine with openmpi, but there is an obstacle with mpich: there is no analog ot variable OMPI_COMM_WORLD_RANK (used, e.g., in M2/Macaulay2/bin/main.cpp). Is there any way to know if mpirun was used to execute the program?

I've replaced openmpi with mpich on my ubuntu and now M2 crashes (unless executed with mpirun).

antonleykin avatar May 05 '23 15:05 antonleykin

Looking online, one suggestion seems to be to use the function MPI_Comm_rank instead, is that the same?

mahrud avatar May 05 '23 16:05 mahrud

Looking online, one suggestion seems to be to use the function MPI_Comm_rank instead, is that the same?

Well, probing the environment variable was a way to determine if the execution started via mpirun. It looks like MPI builds successfully now on github... so perhaps I overlooked something on my local machine.

antonleykin avatar May 05 '23 18:05 antonleykin

Even if you want the standard distribution to be MPI-capable (which is going to be difficult with brew and potentially also debian, since typically MPI-capable binaries are distributed in separate packages), there should be a way to disable MPI, similar to TBB, OpenMP, Python, etc.

Let me ask the opinion of @d-torrance (at least on debian). Do you think there need to distribute the MPI version of M2 separately? (So far, on ubuntu and macOS, there seems to be no need to have two binaries: the one we build can be executed with or without mpirun.)

antonleykin avatar May 05 '23 19:05 antonleykin

Now that the tests pass, could you turn the MPI tests into either tests for the package or perhaps tests under Macaulay2/tests/MPI? (I think the former is better, unless you don't plan there to be an MPI package)

After this, we can remove this section: https://github.com/Macaulay2/M2/blob/e01bbce3dedee5a8204c9558bafceb019725739e/.github/workflows/test_build.yml#L208-L213

mahrud avatar May 06 '23 15:05 mahrud

I had success building this on Fedora 34. It still segfaults on Ubuntu.

antonleykin avatar May 19 '23 11:05 antonleykin