ompi
ompi copied to clipboard
mpirun --help needs updating
The output of mpirun --help is rather out of date, and needs to be carefully read over and updated to account for new, removed, and changed options. Descriptions for some of these are either wrong, or insufficient.
Additionally, some of the mpirun --help foo options do not display anything when they should. For example:
$ ./exports/bin/mpirun --help ppr
--------------------------------------------------------------------------
Help was requested for an unknown option:
Option: ppr
Please use the "mpirun --help" command to obtain a list of all
supported options.
--------------------------------------------------------------------------
where this should print the ppr help message.
Refs: #10698
Notes:
--helpshould dump all of the options- Verify all of the options work with
mpirun
- Verify all of the options work with
--help ARGshould dump information about just that argument- Make sure all of the category ARGs work (e.g.,
--help ppr)- Some categories might not work - just document those that work vs don't work for now.
- Verify that the information is correct - clarify anything that needs clarification
- Make sure all of the category ARGs work (e.g.,
- All changes need to go into PRRTE (maybe in the
schizo/ompicomponent, but likely will touch the core) - Once the CLI is fixed, then we need to look at the Open MPI docs
- https://docs.open-mpi.org/en/main/
@jjhursey asked me to check mpirun options to make sure that reasonable help text was displayed for each option and that the mpirun command recognized each option in running a simple test.
I tested options in order top-down as specified in schizo_ompi.c
I tracked this by creating a text file with mpirun --help and then annotate the options I tested by adding lines starting with @@ after each option. I ran simple mpirun tests for each options to see if they seemed to work. In some cases I wasn't sure exactly how to use the option or what it was supposed to do, so I may have reported false failures.
These options were not properly recognized by mpirun --help commands
--omca: Option is recognized but no help text (mpirun --help omca says it is an invalid option) --gomca: Option is recognized but no help text (mpirun --help gomca says it is an invalid option) --parsable: Option is recognized but no help text (mpirun --help parsable says it is an invalid option) --parseable: Option is recognized but no help text (mpirun --help parseable says it is an invalid option) -n: Not included in mpirun --help text, but does have 'mpirun --help n' text and is recognized by mpirun command. -np: Not included in mpirun --help text, but does have 'mpirun --help np' text and is recognized by mpirun command. -c: Not included in mpirun --help text and not recognized by 'mpirun --help c', but is recognized by the mpirun command. --app: Has help text for 'mpirun --help app' but no text for 'mpirun --help'. 'mpirun --app appfile hello' runs but it's not clear how to specify option in appfile or if appfile works
The options where help text was displayed are annotated, along with mpirun command results, in the attached file. help.txt
I'm not expecting parameters like ppr to have individual help text since they are parameters to mpirun options such as --map-by or --bind-to which have their own help text.
I tested the remaining mpirun options, and have results in two files.
I updated help.txt with more comments, flagged with '%%' to distinguish them from the first set, flagged with '@@'.
There is a list of deprecated mpirun options in schizo_ompi.c, which I tested. Most/all of them did not appear in the mpirun --help text, so I created a separate file with the results of testing this. deprecated.txt.
All of these tests were run with a clean OpenMPI build, cloned around noon on 8/24.
Supported: Failed to run
--stop-in-app-
does not accept the parameter mentioned in the help text, and if the parameter is omitted, mpirun doesn't seem to do anything.
- ACTION: Needs fixing. Austen may have already fixed this.
-
--output proctable x-
specifying '--output proctable x' writes to stdout/stderr and no file is created.
- ACTION: Needs fixing or clarifying
-
--launch-agent <arg0>-
if I specify an invalid executable, like 'zzz', then no error message is issued and the application runs on the remote node specified by --host.
- ACTION: Needs an error message
-
--personality <arg0>-
seems to accept anything, like --personality xxx, without error
- ACTION: Needs an error message. Should we accept anything other than
ompi? Probably not. Take this out of the --help list.
-
-v/-V-
mpirun seems to accept the -v or -V options, but I'm not sure they do anything.
- ACTION: Needs to be fixed
-
--output <arg0>-
Output options work, but output for dir and file options still displays to terminal in addition to specified destination
- ACTION: Needs to be fixed or clarified if there is a "don't echo to terminal option"
-
--stop-in-app-
no mention of how the application-determined point is specified
- ACTION: Needs fixing as it does not work correctly.
-
-s|--preload-binary-
If
-nis before--preload-binarythen it works. If-nis after then it launched one per physical core. Unexpected ordering behavior. - ACTION: Fix this
-
Supported: Need better help messages
- ACTION: Generally the help text printed by
mpirun --help ARGshould be more informative than the short text displayed in--help. - ACTION: Cleanup these help messages
mpirun --help output-
Needs description of supported directives
-
mpirun --help display-
Does not display much information
-
--debug-daemons-file-
It doesn't give filename or filename pattern. Unclear default
- ACTION: remove from --help
-
--leave-session-attached-
Help text is confusing to me since I don't understand why asking to leave session attached has anything with discarding stdout/stderr.
- ACTION: remove from --help
-
--output <arg0>-
It mentions qualifiers, but what the allowed qualifiers are or how to specify them.
-
--default-hostfile-
not sure what a default hostfile is
- ACTION: Clarify difference between --hostfile and --default-hostfile
-
--rankfile <arg0>-
is a deprecated option not marked as deprecated.
- ACTION: Is this really deprecated? Mark as not deprecated.
-
--noprefix-
'automatic --prefix behavior' is not explained.
-
--prefix <arg0>-
maybe it should say something about prefix being the root directory for the MPI installation.
-
--tune <arg0>-
it is not clear from help text that the string must be 'parm=value'. File option?
-
--mca <arg0> <arg1>-
mpirun recognizes this option but does not flag it as deprecated.
- ACTION: Are we really deprecating MCA? I hope not. Clarify that it is not deprecated. Does this set PMIX/PRRTE/OMPI MCA versions, or just the OMPI MCA? We should clarify.
-
Supported: Not recognized by mpirun, but are listed in --help
-H(short form of --host)-
Help is not displayed for '-H' option.
- ACTION: Fix
-
- ACTION: Remove from --help -- possibly deeper
--gpmixmca <arg0> <arg1>
- ACTION: Remove the following from --help
--test-suicide <arg0>--set-cwd-to-session-dir--daemonize--keepalive <arg0>--singleton <arg0>--no-ready-msg--report-pid <arg0>--report-uri <arg0>--set-sid--system-server
Supported: Reported as invalid option
- ACTION: Add text for these. Make sure they have backing code.
mpirun --help initial-errhandlermpirun --help softmpirun --help archmpirun --help filempirun --help with-ft
- ACTION: Translate to the backend function
mpirun --help display-commmpirun --help display-comm-finalizempirun --help output-proctable
Deprecated: Need --help deprecated statement
- ACTION: Add deprecated text (Verify with the MCA deprecation warning enabled)
--nolocal--oversubscribe--use-hwthread-cpus--cpu-set--cpu-list--bind-to-core--bynode--bycore--cpus-per-proc--cpus-per-rank--npernode--pernode--byslot--npersocket--ppr--amca--am--debug
Deprecated: Other
--output-filename-
command processing does not seem to match the help text description. If I specify '--output-filename xxx' I get output files per task and file descriptor prefixed with 'xxx'.
- ACTION: Needs fixing
-
--display-topo-
The mpirun command flags the option as requiring a parameter, but I don't know what a valid value is, so adding a parameter fails as well
- ACTION: Needs fixing. Austen may have a fix posted
-
--display-devel-allocation-
A
mpirun --display-devel-allocation -2 hellocommand does not display any allocation text. -
Adding a valid --host option results in mpirun hanging.
- ACTION: This may be removed
-
--use-hwthread-cpus-
mpirun fails telling me the bind-to directive has an invalid qualifier hwthread.
- ACTION: Need to fix the translation
-
--debug-
mpirun accepts the command but doesn't seem to do anything.
- ACTION: Remove from --help
-
From #10698:
- --app, deprecated translation is not working
- -N doesn't appear in --help, possible removal candidate. Maps to --map-by ppr:1:node, so when used with --map-by this is confusing.
- --get-stack-traces seems to fail at larger scales via a hang or crash
- --mca mca_base_env_list does not work on recent prrte updates
- --cou-set/--cpu-list: Works - somewhat. If you ask for more ranks than cpus you get:
PRTE ERROR: Unable to map job in file rmaps_rr.c at line 184
even with --oversubscribe
- --show-progress, doesn't seem to do anything. Removal candidate?
I'll take a pass at the
Supported: Not recognized by mpirun, but are listed in --help
items
You need to be careful here to distinguish between options that face the user vs options thatmpirun must accept due to other requirements. For example the --keepalive option is used when PMIx is asked to fork/exec the DVM in response to a singleton comm_spawn, or is asked to fork/exec mpirun by a debugger tool (which is a use-case from the DDT debugger team). Likewise, --singleton is required by the singleton comm_spawn as that is what passes the singleton's ID to the DVM so it knows who it is supporting.
There are a number of these "hidden" options that you can, if you wish, remove from the help file as they are not generally used directly by a user. However, a developer (e.g., writing a tool) might need to know they exist and how to use them.
I'll try to provide some thoughts when/where I can.
--leave-session-attached is usually the very first thing we ask the user to do when they report launch problems so we can see if any error messages are coming from the daemons. It is also needed if you want to see the daemon output from any verbose options you set. Perhaps better to simply improve the help message on it.
--personality probably isn't something the user needs to set. However, you may be missing something for supporting OSHMEM apps based on OMPI. There are supporting elements in PMIx for open shmem applications based on input I received from Mellanox and SUNY, so you probably should pass the oshmem personality down to PMIx for those types of applications. Perhaps something like detecting that they used oshrun to start the job, and then add oshmem to the personality field of the prte_job_t when processing envars or some other entry point? Might need some investigation.
Do you guys want me to attend a Tues meeting, or perhaps a dedicated one, to review these? I fear that there is some misunderstanding here regarding the use of many of these options. It is probably okay to remove some from the help text, but we somehow have to maintain their usage or else other things the we regularly use will break and/or no longer be available.
@rhc54 in talking to others, the v5 RMs WOULD like to meet with you for some clarification. Look in PMIx slack for Austen's message. We have a resource who can help with some implementation after we make some decisions, so lets do it! :)
Sure - happy to do so. Sorry I missed today's meeting - had a doctor's appt. Will work with Austen on alternate times.
Quick follow-up regarding mpirun with DVM... it seems like the best thing for users would be to add ability to pass --dvm-uri down from the schizo/ompi options. Otherwise they would need to use prun and all the other MPI related options would be absent.
I thought this would be a matter of adding PMIX_OPTION_DEFINE(PRTE_CLI_DVM_URI, PMIX_ARG_REQD) to schio/ompi but quick check didn't have that in resulting mpirun --help output. So I missed something.
OMPI folks decided they did not want mpirun to connect to a DVM, and therefore there is no option for doing so.
There is no problem with OMPI users using prun --personality ompi <bunch of OMPI options> to run an OMPI-based job on a DVM.
I was thinking passing the DVM uri was a good compromise to include for mpirun, as it would provide easy way for user to avoid having to adjust their mpirun to leverage the DVM. Question was if it was difficult to add this functionality, or just "document" how to do it with prun.
Trivial to add, if you folks decide you want to do so.
Q: Can we use prun --personality ompi <bunch of OMPI options> --dvm-uri file:dvm.uri ...? Restated, can we use options from two schizo personalities?
It isn't mixing personalities - just telling prun to use the OMPI personality to parse the cmd line
Ok, thanks for clarification Ralph.
Q: Can we use
prun --personality ompi <bunch of OMPI options> --dvm-uri file:dvm.uri ...? Restated, can we use options from two schizo personalities?
And for notes on this ticket, i did following for quick test...
prte --prtemca prte_pmix_server_verbose 50 --report-uri dvm2.uri >& LOG.dvm2 &
prun --personality ompi --display-comm --dvm-uri file:dvm2.uri --np 4 ./ring_c
tail LOG.dvm2
Closing this as complete, fix has percolated up to v5.0.x in latest submodule update.
prrte - main: https://github.com/openpmix/prrte/pull/1542 v3.0: https://github.com/openpmix/prrte/pull/1548
ompi - main: https://github.com/open-mpi/ompi/pull/10928 v5.0.x: https://github.com/open-mpi/ompi/pull/10934