ompi icon indicating copy to clipboard operation
ompi copied to clipboard

A variety of docs updates

Open jsquyres opened this issue 3 years ago • 16 comments

  • Typo fixes
  • Add :ref: links to man pages (mostly mpirun and ompi_info)
  • Add notes for contributors about PR'ing to main first and then cherry-picking to release branches later. Thanks to @jolivain suggesting that we add this policy to the docs.
  • Include contributor suggestion to submit fixes to the docs.
  • Renamed Developers -> Git to "GitHub, Git, and related topics". Added info about:
    • Git commits and a reference to the contributors declaration (in contributors.rst)
    • Branching scheme
    • Details about PR to main first and cherry-picking to release branches
    • A few words about Github PR CI / MTT
  • Added information about running Sphinx, and how to view the Sphinx docs locally
  • Added notes about how to view man pages locally
  • Added a placeholder oshrun.1 man page (it just refers to mpirun.1)
  • Per https://github.com/open-mpi/ompi/pull/10772#discussion_r964872603, discuss PMIx and PRRTE MCA
  • Mention Perl and Python as tools required by Open MPI developers
  • Expanded on some "advice for packagers" from the "required support dependencies" section, and moved it to its own section:
    • Don't use Open MPI's bundled sub-packages (Libevent, Hwloc, PMIx, PRTE)
    • Discussion of components: included in project libraries vs. DSOs
  • Add short "prerequisites" section for running MPI apps

jsquyres avatar Sep 12 '22 22:09 jsquyres

@awlauria @jjhursey Could you guys especially check the PMIx / PRRTE tables I created on https://ompi--10793.org.readthedocs.build/en/10793/running-apps/tuning.html? Search for "PMIx" on that page to find all the text I added about this.

jsquyres avatar Sep 12 '22 22:09 jsquyres

Errr...just glanced at it and there are a number of errors (e.g., OMPI MCA params on the cmd line are not --mca, but --omca, the default user-level PMIx param file is in the .pmix subdirectory, not .openpmix, etc.). Not sure how you want me to identify those for you?

rhc54 avatar Sep 12 '22 22:09 rhc54

@rhc54 Great, thanks! If you could mark up the PR here, that would be great: https://github.com/open-mpi/ompi/pull/10793/files#diff-c0a15e3d161d877e29843a3d0ef2cddc0a239a7167c837082fcf519f39a8b678

jsquyres avatar Sep 12 '22 23:09 jsquyres

@rhc54 I note that passing MCA params on the mpirun command line does use --mca, however. Is this part of the OMPI schitzo component? This is from an OMPI main build:

$ mpirun --help |& grep mca
   --mca <arg0> <arg1>               Pass context-specific MCA parameters; they are considered global if
                                     --gmca is not used and only one context is specified (arg0 is the
   --pmixmca <arg0> <arg1>           Pass context-specific PMIx MCA parameters; they are considered global if
   --prtemca <arg0> <arg1>           Pass context-specific PRTE MCA parameters to the DVM
...

jsquyres avatar Sep 12 '22 23:09 jsquyres

That is an error in your help text file. The --mca is a generic parameter option where you don't know which project the param belongs to and you want PRRTE to make a "best effort" guess on your behalf. If we guess wrong, it won't do anything. So you are better off using the project-specific --omca option unless you really don't know what project the param belongs to.

rhc54 avatar Sep 12 '22 23:09 rhc54

@rhc54 @bwbarrett and I met last week and discussed Ralph's comments. He's right: there's a technical challenge on the PRTE side to figure out which framework a given MCA param is destined for. We think we have a solution, and are working on it (independent of this PR).

I just pushed a few more text updates to this PR.

jsquyres avatar Sep 19 '22 20:09 jsquyres

bot:aws:recheck

jsquyres avatar Sep 20 '22 10:09 jsquyres

Per Slack discussion with @bwbarrett, also remove advice about using || and && with the test operator.

jsquyres avatar Sep 20 '22 19:09 jsquyres

bot:ibm:retest

awlauria avatar Sep 20 '22 21:09 awlauria

Force pushed some minor corrections:

  • Fixed text stating that the BTLs were in libmpi.so
  • Added some cross-references between configure CLI options and advice to packagers
  • Clarified an example with a longer comment
  • Removed some TODO labels

jsquyres avatar Sep 21 '22 13:09 jsquyres

Mellanox CI has some internal error right now (failing to pull a docker image).

jsquyres avatar Sep 21 '22 13:09 jsquyres

bot:aws:retest

jsquyres avatar Sep 21 '22 18:09 jsquyres

Same Mellanox CI internal failure as yesterday:

/usr/bin/docker pull rdmz-harbor.rdmz.labs.mlnx/hpcx/ompi_ci:latest
Error response from daemon: unknown: repository hpcx/ompi_ci not found

jsquyres avatar Sep 22 '22 12:09 jsquyres

/azp run

jsquyres avatar Sep 23 '22 10:09 jsquyres

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Sep 23 '22 10:09 azure-pipelines[bot]