ompi icon indicating copy to clipboard operation
ompi copied to clipboard

cpu-list/pe-list not working in v5.0.x

Open gkatev opened this issue 3 years ago • 8 comments

Hi, I'm trying to use the cpu-list functionality in v5.0.x (7d0750a), but I'm getting some errors. They do appear to be parsing/typo related to some extent.

$ git submodule
 77fc96c5a045060810d23ba8080c62fbc074aefe 3rd-party/openpmix (v4.1.2-71-g77fc96c)
 6a034d18792a0fb5e87b3850bf97ecb767e8f1c2 3rd-party/prrte (v2.0.2-109-g6a034d1)

In v4 I'm able to use --cpu-list 0,4 --bind-to cpu-list:ordered to run on cores 0 and 4. What would the intended equivalent of this in v5.0.x be?

$ mpirun -n 2 --cpu-list 0,4 --bind-to cpu-list:ordered osu_latency
The specified map-by directive is not recognized:

  Directive: pe-list=0,4
  Valid directives: slot,hwthread,core,l1cache,l2cache,l3cache,numa,package,node,seq,dist,ppr,rankfile

Please check for a typo or ensure that the directive is a supported one.


$ mpirun -n 2 --map-by :pe-list=0,4 osu_latency
The map-by directive contains an unrecognized qualifier:

  Qualifier: 4
  Valid qualifiers: pe=,span,oversubscribe,nooversubscribe,nolocal,hwtcpus,corecpus,device=,inherit,noinherit,pe-list=,file=,donotlaunch

Please check for a typo or ensure that the qualifier is a supported one.


$ mpirun -n 2 --map-by core:pe-list=0,4 osu_latency
The map-by directive contains an unrecognized qualifier:

  Qualifier: 4
  Valid qualifiers: pe=,span,oversubscribe,nooversubscribe,nolocal,hwtcpus,corecpus,device=,inherit,noinherit,pe-list=,file=,donotlaunch

Please check for a typo or ensure that the qualifier is a supported one.


$ mpirun -n 3 --map-by core:pe-list=0,4,8 osu_latency
The map-by directive contains an unrecognized qualifier:

  Qualifier: 4
  Valid qualifiers: pe=,span,oversubscribe,nooversubscribe,nolocal,hwtcpus,corecpus,device=,inherit,noinherit,pe-list=,file=,donotlaunch

Please check for a typo or ensure that the qualifier is a supported one.


$ mpirun -n 2 --map-by :pe-list=0,4 --bind-to cpu-list:ordered osu_latency
The map-by directive contains an unrecognized qualifier:

  Qualifier: 4
  Valid qualifiers: pe=,span,oversubscribe,nooversubscribe,nolocal,hwtcpus,corecpus,device=,inherit,noinherit,pe-list=,file=,donotlaunch

Please check for a typo or ensure that the qualifier is a supported one.


$ mpirun -n 2 --bind-to cpu-list:ordered osu_latency
The specified bind-to directive is not recognized:

  Directive: cpu-list:ordered
  Valid directives: none,hwthread,core,l1cache,l2cache,l3cache,numa,package

Please check for a typo or ensure that the directive is a supported one.

I assume that in such matters the information from mpirun's usage is generally considered more up-to-date than the (v5.0.x) man page?

gkatev avatar Jun 17 '22 07:06 gkatev

The man page is indeed out-of-date - not sure if/when it will be updated. A quick check of the code doesn't find any "ordered" option, although the mapper code does support it - I'll have to add it. The "pe-list" option had a bug in processing lists that included commas, but that has been fixed - could possibly be a question of what has gone back to PRRTE v2.1 release branch and whether or not that was captured in an update to OMPI.

rhc54 avatar Jun 17 '22 12:06 rhc54

Ah okay, so I wait for the fix to land in ompi (maybe it already is in master, I didn't check.. I will). But from what I understand after pe-list is fixed, further fixing will still be required for the "ordered" stuff, so that the cpu list will function like it did in v4? Note also in the first example another possible typo: --cpu-list 0,4 gets translated to pe-list=0,4 instead of what I assume should be :pe-list=0,4

gkatev avatar Jun 17 '22 13:06 gkatev

Agreed: the mpirun man page in v5.0 is (currently) a slightly warmed-over version of the v4.1.x page. See #10480 -- it needs to be updated before v5.0.0 is released.

jsquyres avatar Jun 17 '22 13:06 jsquyres

Actually, I have to correct myself - there is an "ordered" option and it should be working correctly. I'll check the translation and operation here in a bit as part of the mapper update.

@jsquyres Someone over here perhaps can update that man page? I can provide advice if it would help.

rhc54 avatar Jun 17 '22 14:06 rhc54

@rhc54 Yes. I actually have a draft email to you that I started after typing that response earlier this morning w.r.t. mpirun docs and our email earlier this week. Let me finish + send it to ya...

jsquyres avatar Jun 17 '22 15:06 jsquyres

I have this properly working now in my branch and will be opening a PR to bring it into PRRTE's master branch tomorrow. However, the options have changed. The pe-list is now in the map-by option as a directive and not a qualifier, and the ordered qualifier has also moved to the map-by option. The --help map-by output includes the following:

Processes are mapped in a round-robin fashion based on
one of the following directives as applied at the job level:
...
-   PE-LIST=a,b assigns procs to each node in the allocation based on
    the ORDERED qualifier. The list is comprised of comma-delimited
    ranges of CPUs to use for this job. If the ORDERED qualifier is not
    provided, then each node will be assigned procs up to the number of
    available slots, capped by the availability of the specified CPUs.
    If ORDERED is given, then one proc will be assigned to each of the
    specified CPUs, if available, capped by the number of slots on each
    node and the total number of specified processes. Providing the
    OVERLOAD qualifier to the "bind-to" option removes the check on
    availability of the CPU in both cases.
...
Any directive can include qualifiers by adding a colon (:) and any
combination of one or more of the following to the --map-by option
(except where noted):
...
-   ORDERED only applies to the PE-LIST option to indicate that procs
    are to be bound to each of the specified CPUs in the order
    in which they are assigned (i.e., the first proc on a node shall
    be bound to the first CPU in the list, the second proc shall be
    bound to the second CPU, etc.)
...

Should be available over the weekend.

rhc54 avatar Jul 15 '22 03:07 rhc54

@gkatev Do you see this issue with 5.0.2?

wenduwan avatar Feb 15 '24 17:02 wenduwan

I tried --map-by pe-list=0,4,8,16,32 --report-bindings and got

[ip-172-31-41-217:434126] Rank 0 bound to package[0][core:0,4,8,16,32]
[ip-172-31-41-217:434126] Rank 1 bound to package[0][core:0,4,8,16,32]
[ip-172-31-41-217:434126] Rank 2 bound to package[0][core:0,4,8,16,32]
[ip-172-31-41-217:434126] Rank 4 bound to package[0][core:0,4,8,16,32]
[ip-172-31-41-217:434126] Rank 3 bound to package[0][core:0,4,8,16,32]

wenduwan avatar Feb 15 '24 18:02 wenduwan

Indeed --map-by looks good:

$ mpirun --report-bindings -n 2 --map-by pe-list=0,3 true                                                                     
[gkpc:325295] Rank 0 bound to package[0][core:0,3]
[gkpc:325295] Rank 1 bound to package[0][core:0,3]

But can I pin each rank to a different/single core? Should --bind to core be working for this?

$ mpirun --report-bindings -n 2 --map-by pe-list=0,3 --bind-to core true
[gkpc:325316] Rank 0 bound to package[0][core:0,3]
[gkpc:325316] Rank 1 bound to package[0][core:0,3]

$ mpirun --display map -n 2 --map-by pe-list=0,3 true 

========================   JOB MAP   ========================
Data for JOB prterun-gkpc-325367@1 offset 0 Total slots allocated 8
    Mapping policy: PE-LIST:NOOVERSUBSCRIBE  Ranking policy: SLOT Binding policy: CORE:IF-SUPPORTED
    Cpu set: 0,3  PPR: N/A  Cpus-per-rank: N/A  Cpu Type: CORE


Data for node: gkpc	Num slots: 8	Max slots: 0	Num procs: 2
        Process jobid: prterun-gkpc-325367@1 App: 0 Process rank: 0 Bound: package[0][core:0,3]
        Process jobid: prterun-gkpc-325367@1 App: 0 Process rank: 1 Bound: package[0][core:0,3]

=============================================================

$ mpirun --display map -n 2 --map-by pe-list=0,3 --bind-to core true

========================   JOB MAP   ========================
Data for JOB prterun-gkpc-325173@1 offset 0 Total slots allocated 8
    Mapping policy: PE-LIST:NOOVERSUBSCRIBE  Ranking policy: SLOT Binding policy: CORE
    Cpu set: 0,3  PPR: N/A  Cpus-per-rank: N/A  Cpu Type: CORE


Data for node: gkpc	Num slots: 8	Max slots: 0	Num procs: 2
        Process jobid: prterun-gkpc-325173@1 App: 0 Process rank: 0 Bound: package[0][core:0,3]
        Process jobid: prterun-gkpc-325173@1 App: 0 Process rank: 1 Bound: package[0][core:0,3]

=============================================================

gkatev avatar Mar 04 '24 18:03 gkatev

Sure - all you have to do is add ordered:

--map-by pe-list=0,3:ordered

This will put the first process on core 0 and the second on core 3 on each node.

Any time you specify a pe-list, you are forced to bind-to core or bind-to hwt because you are specifying specific pe's to use.

rhc54 avatar Mar 05 '24 17:03 rhc54

Perfect! - thanks - I think we're all good here.

gkatev avatar Mar 05 '24 17:03 gkatev