charm
charm copied to clipboard
Incorrect values returned by CmiRankOf and CmiNodeOf on the comm thread
Adding the following line to the end of ConverseCommonInit
shows that CmiRankOf
and CmiNodeOf
return incorrect values.
Line to add:
CmiPrintf("[PE:%d][Node:%d][Rank:%d] ConverseCommonInit CmiMyNodeSize()=%d, CmiNodeOf(CmiMyPe()) = %d, CmiRankOf(CmiMyPe()) =%d\n", CmiMyPe(), CmiMyNode(), CmiMyRank(), CmiMyNodeSize(), CmiNodeOf(CmiMyPe()), CmiRankOf(CmiMyPe()));
On running tests/charm++/simplearrayhello
with an smp build (like mpi-smp
), you can see:
nbhat4@courage:/scratch/nitin/charm_2/tests/charm++/simplearrayhello$ make test
../../../bin/charmc hello.ci
../../../bin/charmc -c hello.C
../../../bin/charmc -language charm++ -o hello hello.o
../../../bin/testrun ./hello +p4 10
Running on 4 processors: ./hello 10
charmrun> /usr/bin/setarch x86_64 -R mpirun -np 4 ./hello 10
Charm++> Running on MPI version: 3.0
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: 4 processes, 1 worker threads (PEs) + 1 comm threads per process, 4 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.11.0-devel-211-g6704419f9
Isomalloc> Synchronized global address space.
[PE:3][Node:3][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 3, CmiRankOf(CmiMyPe()) =0
[PE:2][Node:2][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 2, CmiRankOf(CmiMyPe()) =0
[PE:0][Node:0][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 0, CmiRankOf(CmiMyPe()) =0
[PE:7][Node:3][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 7, CmiRankOf(CmiMyPe()) =0
[PE:4][Node:0][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 4, CmiRankOf(CmiMyPe()) =0
CharmLB> Load balancer assumes all CPUs are same.
[PE:1][Node:1][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 1, CmiRankOf(CmiMyPe()) =0
[PE:5][Node:1][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 5, CmiRankOf(CmiMyPe()) =0
[PE:6][Node:2][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 6, CmiRankOf(CmiMyPe()) =0
Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP)
Charm++> cpu topology info is gathered in 0.128 seconds.
nbhat4@courage:/scratch/nitin/charm_2/tests/charm++/simplearrayhello$ make test TESTOPTS="++ppn 2"
../../../bin/testrun ./hello +p4 10 ++ppn 2
Running on 2 processors: ./hello 10 +ppn 2
charmrun> /usr/bin/setarch x86_64 -R mpirun -np 2 ./hello 10 +ppn 2
Charm++> Running on MPI version: 3.0
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: 2 processes, 2 worker threads (PEs) + 1 comm threads per process, 4 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.11.0-devel-211-g6704419f9
Isomalloc> Synchronized global address space.
[PE:0][Node:0][Rank:0] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 0, CmiRankOf(CmiMyPe()) =0
[PE:3][Node:1][Rank:1] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 1, CmiRankOf(CmiMyPe()) =1
[PE:2][Node:1][Rank:0] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 1, CmiRankOf(CmiMyPe()) =0
[PE:5][Node:1][Rank:2] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 2, CmiRankOf(CmiMyPe()) =1
[PE:1][Node:0][Rank:1] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 0, CmiRankOf(CmiMyPe()) =1
[PE:4][Node:0][Rank:2] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 2, CmiRankOf(CmiMyPe()) =0
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP)
Charm++> cpu topology info is gathered in 0.087 seconds.
Do you only see this on MPI builds?
I also saw it on UCX builds as well IIRC, on Frontera. (that's the reason I tested this on my local machine and saw the incorrect values with mpi-smp
on courage
).