charm icon indicating copy to clipboard operation
charm copied to clipboard

Incorrect values returned by CmiRankOf and CmiNodeOf on the comm thread

Open nitbhat opened this issue 4 years ago • 2 comments

Adding the following line to the end of ConverseCommonInit shows that CmiRankOf and CmiNodeOf return incorrect values.

Line to add: CmiPrintf("[PE:%d][Node:%d][Rank:%d] ConverseCommonInit CmiMyNodeSize()=%d, CmiNodeOf(CmiMyPe()) = %d, CmiRankOf(CmiMyPe()) =%d\n", CmiMyPe(), CmiMyNode(), CmiMyRank(), CmiMyNodeSize(), CmiNodeOf(CmiMyPe()), CmiRankOf(CmiMyPe()));

On running tests/charm++/simplearrayhello with an smp build (like mpi-smp), you can see:


nbhat4@courage:/scratch/nitin/charm_2/tests/charm++/simplearrayhello$ make test
../../../bin/charmc   hello.ci
../../../bin/charmc  -c hello.C
../../../bin/charmc  -language charm++ -o hello hello.o
../../../bin/testrun  ./hello +p4 10

Running on 4 processors:  ./hello 10
charmrun>  /usr/bin/setarch x86_64 -R  mpirun -np 4  ./hello 10
Charm++> Running on MPI version: 3.0
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: 4 processes, 1 worker threads (PEs) + 1 comm threads per process, 4 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.11.0-devel-211-g6704419f9
Isomalloc> Synchronized global address space.
[PE:3][Node:3][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 3, CmiRankOf(CmiMyPe()) =0
[PE:2][Node:2][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 2, CmiRankOf(CmiMyPe()) =0
[PE:0][Node:0][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 0, CmiRankOf(CmiMyPe()) =0
[PE:7][Node:3][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 7, CmiRankOf(CmiMyPe()) =0
[PE:4][Node:0][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 4, CmiRankOf(CmiMyPe()) =0
CharmLB> Load balancer assumes all CPUs are same.
[PE:1][Node:1][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 1, CmiRankOf(CmiMyPe()) =0
[PE:5][Node:1][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 5, CmiRankOf(CmiMyPe()) =0
[PE:6][Node:2][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 6, CmiRankOf(CmiMyPe()) =0
Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP)
Charm++> cpu topology info is gathered in 0.128 seconds.



nbhat4@courage:/scratch/nitin/charm_2/tests/charm++/simplearrayhello$ make test TESTOPTS="++ppn 2"
../../../bin/testrun  ./hello +p4 10  ++ppn 2

Running on 2 processors:  ./hello 10 +ppn 2
charmrun>  /usr/bin/setarch x86_64 -R  mpirun -np 2  ./hello 10 +ppn 2
Charm++> Running on MPI version: 3.0
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: 2 processes, 2 worker threads (PEs) + 1 comm threads per process, 4 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.11.0-devel-211-g6704419f9
Isomalloc> Synchronized global address space.
[PE:0][Node:0][Rank:0] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 0, CmiRankOf(CmiMyPe()) =0
[PE:3][Node:1][Rank:1] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 1, CmiRankOf(CmiMyPe()) =1
[PE:2][Node:1][Rank:0] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 1, CmiRankOf(CmiMyPe()) =0
[PE:5][Node:1][Rank:2] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 2, CmiRankOf(CmiMyPe()) =1
[PE:1][Node:0][Rank:1] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 0, CmiRankOf(CmiMyPe()) =1
[PE:4][Node:0][Rank:2] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 2, CmiRankOf(CmiMyPe()) =0
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP)
Charm++> cpu topology info is gathered in 0.087 seconds.

nitbhat avatar Apr 29 '20 22:04 nitbhat

Do you only see this on MPI builds?

rbuch avatar Jun 10 '20 19:06 rbuch

I also saw it on UCX builds as well IIRC, on Frontera. (that's the reason I tested this on my local machine and saw the incorrect values with mpi-smp on courage).

nitbhat avatar Jun 10 '20 19:06 nitbhat