mpich icon indicating copy to clipboard operation
mpich copied to clipboard

[RFC] modules: Build embedded libfabric as a shared library

Open raffenet opened this issue 6 months ago • 3 comments

Pull Request Description

There are libfabric providers (psm3) that rely on library constructors that do not work correctly when libfabric is statically linked to MPICH. Rather than having to tell users to avoid certain providers in an embedded configuration, always use a shared library to avoid the problem.

Author Checklist

  • [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • [x] Commits Follow Good Practice Commits are self-contained and do not do two things at once. Commit message is of the form: module: short description Commit message explains what's in the commit.
  • [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
  • [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.

raffenet avatar Sep 11 '25 19:09 raffenet

test:mpich/ch4/ofi

raffenet avatar Sep 11 '25 19:09 raffenet

Reference: https://github.com/pmodels/mpich/pull/7193

hzhou avatar Sep 19 '25 19:09 hzhou

Reference: #7193

Thanks. As far as I can tell the --with-libfabric=install still statically links the library into MPICH. The libfabric.so that gets installed alongside MPICH hides most of the fi_<foo> symbols so it does not seem usable as a library.

raffenet avatar Sep 19 '25 20:09 raffenet