mpich icon indicating copy to clipboard operation
mpich copied to clipboard

coll/csel: add device-layer collective algorithms

Open hzhou opened this issue 5 months ago • 1 comments

Pull Request Description

Following https://github.com/pmodels/mpich/pull/7547, add mechanism to enable device-layer collective algorithms to the redesigned collection selection framework.

For now, this PR just added MPIDI_POSIX_mpi_bcast_release_gather for intranode posix bcast. More algorithms will be added following this example.

Algorithms that depends on configure or runtime options such as nccl algorithms can be treated the same way as device-layer algorithm. All we needed is to add a condition checker function in src/mpi/coll/coll_algorithms.txt.

This PR is based on top of https://github.com/pmodels/mpich/pull/7547, only the last a few commits are for this PR. [skip warnings]

Demo

Use test/mpi/coll/bcasttest in 4 processes, setting MPIR_CVAR_DUMP_COLL_ALGO_COUNTERS=0 -

  • Set MPIR_CVAR_BCAST_INTRA_ALGORITHM=release_gather
[0] No Errors
[0] ==== Dump collective algorithm counters ====
[0]         20  MPIDI_POSIX_mpi_bcast_release_gather
[0]          1  MPIR_Reduce_intra_binomial
[0] ==== END collective algorithm counters ====

MPI_Bcast selects MPIDI_POSIX_mpi_bcast_release_gather every time. The reduce is from MTest_Finalize

  • Not set algorithm CVAR, using JSON selection:
[0] No Errors
[0] ==== Dump collective algorithm counters ====
[0]          4  MPIR_Bcast_intra_scatter_ring_allgather
[0]         16  MPIDI_POSIX_mpi_bcast_release_gather
[0]          1  MPIR_Reduce_intra_binomial
[0] ==== END collective algorithm counters ====

Release_gather algorithm only gets selected after MPIR_CVAR_POSIX_NUM_COLLS_THRESHOLD(`5) times being called.

  • Set MPIR_CVAR_ODD_EVEN_CLIQUES=1 to simulate inter-node communicator -
[0] No Errors
[0] ==== Dump collective algorithm counters ====
[0]         15  MPIR_Bcast_intra_binomial
[0]          5  MPIR_Bcast_intra_scatter_ring_allgather
[0]          1  MPIR_Reduce_intra_binomial
[0] ==== END collective algorithm counters ====

Release_gather not selected by the condition, MPIR_Bcast_intra_scatter_ring_allgather is selected for small message bcast, and median to large messages selects MPIR_Bcast_intra_binomial.

  • Setting both MPIR_CVAR_ODD_EVEN_CLIQUES=1 and MPIR_CVAR_BCAST_INTRA_ALGORITHM=release_gather

Currently it works the same as above as the CVAR algorithm fails restriction check and it falls back to MPIR_Coll_auto.

Should we fail in this case by default instead?

Author Checklist

  • [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • [x] Commits Follow Good Practice Commits are self-contained and do not do two things at once. Commit message is of the form: module: short description Commit message explains what's in the commit.
  • [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
  • [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.

hzhou avatar Sep 25 '25 20:09 hzhou

Build, other than ch4:ofi, failed due to MPIDI_POSIX_mpi_bcast_release_gather not defined. Of course :( I need figure out a solution to define partial coll_algorithms.txt in the device layer -- similar to the way subconfigure.m4 works.

PS: also cvar and json

  • [x] fixed

hzhou avatar Sep 25 '25 20:09 hzhou