ompi icon indicating copy to clipboard operation
ompi copied to clipboard

Map by l3cache ignored in v5

Open gkatev opened this issue 3 years ago • 4 comments

Hi, I'm trying to specify --map-by l3, on v5.0.0rc7, but it's not working. Other mappings, like core, numa, socket work as expected.

$ for m in core l3cache numa socket; do echo "# $m"; mpirun -n 2 --bind-to core --map-by $m --display map mpi_init; done
# core

========================   JOB MAP   ========================
Data for JOB mpirun-taos-103057@1 offset 0 Total slots allocated 64
    Mapping policy: BYCORE:NOOVERSUBSCRIBE  Ranking policy: CORE Binding policy: CORE
    Cpu set: N/A  PPR: N/A  Cpus-per-rank: N/A  Cpu Type: CORE


Data for node: taos	Num slots: 64	Max slots: 0	Num procs: 2
        Process jobid: mpirun-taos-103057@1 App: 0 Process rank: 0 Bound: package[0][core:0]
        Process jobid: mpirun-taos-103057@1 App: 0 Process rank: 1 Bound: package[0][core:1]

=============================================================

# l3cache

========================   JOB MAP   ========================
Data for JOB mpirun-taos-103065@1 offset 0 Total slots allocated 64
    Mapping policy: BYSLOT:NOOVERSUBSCRIBE  Ranking policy: SLOT Binding policy: CORE
    Cpu set: N/A  PPR: N/A  Cpus-per-rank: N/A  Cpu Type: CORE


Data for node: taos	Num slots: 64	Max slots: 0	Num procs: 2
        Process jobid: mpirun-taos-103065@1 App: 0 Process rank: 0 Bound: package[0][core:0]
        Process jobid: mpirun-taos-103065@1 App: 0 Process rank: 1 Bound: package[0][core:1]

=============================================================

# numa

========================   JOB MAP   ========================
Data for JOB mpirun-taos-103073@1 offset 0 Total slots allocated 64
    Mapping policy: BYNUMA:NOOVERSUBSCRIBE  Ranking policy: NUMA Binding policy: CORE
    Cpu set: N/A  PPR: N/A  Cpus-per-rank: N/A  Cpu Type: CORE


Data for node: taos	Num slots: 64	Max slots: 0	Num procs: 2
        Process jobid: mpirun-taos-103073@1 App: 0 Process rank: 0 Bound: package[0][core:0]
        Process jobid: mpirun-taos-103073@1 App: 0 Process rank: 1 Bound: package[0][core:8]

=============================================================

# socket

========================   JOB MAP   ========================
Data for JOB mpirun-taos-103082@1 offset 0 Total slots allocated 64
    Mapping policy: BYPACKAGE:NOOVERSUBSCRIBE  Ranking policy: PACKAGE Binding policy: CORE
    Cpu set: N/A  PPR: N/A  Cpus-per-rank: N/A  Cpu Type: CORE


Data for node: taos	Num slots: 64	Max slots: 0	Num procs: 2
        Process jobid: mpirun-taos-103082@1 App: 0 Process rank: 0 Bound: package[0][core:0]
        Process jobid: mpirun-taos-103082@1 App: 0 Process rank: 1 Bound: package[1][core:32]

=============================================================

When map-by l3 is specified, it reports Mapping policy: BYSLOT:NOOVERSUBSCRIBE. This system has 4 cores per L3, so I expect the two processes to map to cores 0 and 4. Testing with v4.1.3 it appears to work correctly:

$ mpirun -n 2 --bind-to core --map-by l3cache --display-map mpi_init
 Data for JOB [3368,1] offset 0 Total slots allocated 64

 ========================   JOB MAP   ========================

 Data for node: taos	Num slots: 64	Max slots: 0	Num procs: 2
 	Process OMPI jobid: [3368,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0-1]]:[BB/../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../..]
 	Process OMPI jobid: [3368,1] App: 0 Process rank: 1 Bound: socket 0[core 4[hwt 0-1]]:[../../../../BB/../../../../../../../../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../..]

 =============================================================

gkatev avatar May 30 '22 12:05 gkatev

Could you please generate an xml file of your topology and post it so I can try to reproduce the issue?

rhc54 avatar Jun 05 '22 23:06 rhc54

lstopo xml

Let me know if I can further help

gkatev avatar Jun 06 '22 07:06 gkatev

Quick status update: I have fixed this, but need a few days to cleanup the patch before committing (need to get all the other mappers updated). Thanks for the topology as that was most helpful!

rhc54 avatar Jun 16 '22 14:06 rhc54

Per an email from @rhc54 - this is now fixed in PRRTe master and the v3.0 branch. I will update this issue when both make their way into ompi.

main: https://github.com/open-mpi/ompi/pull/10611 v5.0.x: https://github.com/open-mpi/ompi/pull/10612

awlauria avatar Jul 26 '22 17:07 awlauria

With the merge of #10612 - closing.

awlauria avatar Aug 23 '22 15:08 awlauria