charm
charm copied to clipboard
CI: Expand to run many more variants
This PR was originally CI: Add mpi-win-x86_64-smp
.
It works but it's too slow.
This experiment has uncovered the following issues that may only show up under resource-constrained or oversubscribed scenarios such as these CI runners.
MPI Darwin, MPI Darwin SMP crash:
/Applications/Xcode_14.2.app/Contents/Developer/usr/bin/make -C lb_test test OPTS='-g -Werror=vla -build-shared -optimize -production' TESTOPTS=''
../../../../bin/testrun +p4 ./lb_test 100 100 10 40 10 1000 ring +balancer GreedyLB +LBDebug 1
Running as 4 OS processes: ./lb_test 100 100 10 40 10 1000 ring +balancer GreedyLB +LBDebug 1
charmrun> mpirun -np 4 ./lb_test 100 100 10 40 10 1000 ring +balancer GreedyLB +LBDebug 1
Charm++> Running on MPI library: MPICH Version: 4.1.1
MPICH Release date: Mon Mar 6 14:14:15 CST 2023
MPICH ABI: 15:0:3
MPICH Device: ch4:ofi
MPICH configure: --disable-dependency-tracking --enable-fast=all,O3 --enable-g=dbg --enable-romio --enable-shared --with-pm=hydra F77=gfortran FC=gfortran FCFLAGS=-fallow-argument-mismatch --disable-silent-rules --prefix=/usr/local/Cellar/mpich/4.1.1_1 --mandir=/usr/local/Cellar/mpich/4.1.1_1/share/man FFLAGS=-fallow-argument-mismatch
MPICH CC: clang -fno-common -DNDEBUG -DNVALGRIND -g -O3
MPICH CXX: clang++ -DNDEBUG -DNVALGRIND -g
MPICH F77: gfortran -fallow-argument-mismatch -g
MPICH FC: gfortran -fallow-argument-mismatch -g
(MPI standard: 4.0)
Charm++> Level of thread support used: MPI_THREAD_SINGLE (desired: MPI_THREAD_SINGLE)
Charm++> Running in non-SMP mode: 4 processes (PEs)
Converse/Charm++ Commit ID: 0e426ec
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Isomalloc> Synchronized global address space.
CharmLB> Verbose level 1, load balancing period: -1 seconds
CharmLB> Load balancer assumes all CPUs are same.
traceprojections was off at initial time.
Charm++> Running on 1 hosts (1 sockets x 3 cores x 1 PUs = 3-way SMP)
Charm++> cpu topology info is gathered in 0.010 seconds.
[0] TreeLB in LEGACY MODE support
[0] TreeLB: Using PE_Root tree with: Greedy
Using 0 as root
Test PE Speed: false
Running lb_test on 4 processors with 100 elements
Print every 10 steps
Sync every 40 steps
First node busywaits 10 usec; last node busywaits 1000 usec
Selecting Topology Ring
Generating topology 0 for 100 elements
[0] Total work/step = 0.020912 sec
calibrated iterations 14038645
------------- Processor 3 Exiting: Caught Signal ------------
[3] Stack Traceback:
Reason: Segmentation fault: 11
[2] Stack Traceback:
------------- Processor 2 Exiting: Caught Signal ------------
[2:0] libsystem_platform.dylib 0x7ff819675dfd _sigtramp
Reason: Segmentation fault: 11
[2:1] 0x0
Abort(1) on node 2 (rank 2 in comm 496): application called MPI_Abort(comm=0x84000001, 1) - process 2
[2:2] lb_test 0x10f4e9879 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >* nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >::create<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, char const* const&>(char const* const&)
Abort(1) on node 3 (rank 3 in comm 496): application called MPI_Abort(comm=0x84000001, 1) - process 3
[2:3] lb_test 0x10f4c227c TreeLB::loadConfigFile(CkLBOptions const&)
[2:4] lb_test 0x10f4c7e34 TreeLB::TreeLB(CkLBOptions const&)
[2:5] lb_test 0x10f4c7be0 CkIndex_TreeLB::_call_TreeLB_marshall1(void*, void*)
[2:6] lb_test 0x10f50354e CkDeliverMessageFree
[2:7] lb_test 0x10f503dda CkCreateLocalGroup
real 0m0.754s
user 0m1.060s
[2:8] lb_test 0x10f5c2c08 _initDone()
sys 0m0.266s
make[4]: *** [test] Error 9
[2:9] lb_test 0x10f5c4f7c _initHandler(void*, CkCoreState*)
make[3]: *** [test-lb_test] Error 2
make[2]: *** [test-load_balancing] Error 2
make[1]: *** [test-charm++] Error 2
make: *** [test] Error 2
[2:10] lb_test 0x10f5d2581 CsdScheduleForever
[2:11] lb_test 0x10f5d23a5 CsdScheduler
[2:12] lb_test 0x10f606742 ConverseInit
[2:13] lb_test 0x10f5c56de charm_main
[2:14] lb_test 0x10f458f34 start
[2:15] 0xc
[3:0] libsystem_platform.dylib 0x7ff819675dfd _sigtramp
[3:1] 0x0
[3:2] lb_test 0x10ab83879 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >* nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >::create<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, char const* const&>(char const* const&)
[3:3] lb_test 0x10ab5c27c TreeLB::loadConfigFile(CkLBOptions const&)
[3:4] lb_test 0x10ab61e34 TreeLB::TreeLB(CkLBOptions const&)
[3:5] lb_test 0x10ab61be0 CkIndex_TreeLB::_call_TreeLB_marshall1(void*, void*)
[3:6] lb_test 0x10ab9d54e CkDeliverMessageFree
[3:7] lb_test 0x10ab9ddda CkCreateLocalGroup
[3:8] lb_test 0x10ac5cc08 _initDone()
[3:9] lb_test 0x10ac5ef7c _initHandler(void*, CkCoreState*)
[3:10] lb_test 0x10ac6c581 CsdScheduleForever
[3:11] lb_test 0x10ac6c3a5 CsdScheduler
[3:12] lb_test 0x10aca0742 ConverseInit
[3:13] lb_test 0x10ac5f6de charm_main
[3:14] lb_test 0x10aaf2f34 start
[3:15] 0xc
MPI Linux error:
make -C user-driven-interop test OPTS='-g -Werror=vla -build-shared -optimize -production' TESTOPTS=''
sys 0m0.150s
make[3]: Entering directory '/home/runner/work/charm/charm/mpi-linux-x86_64/examples/charm++/user-driven-interop'
../../../bin/testrun +p2 ./hello_user 8
Running as 2 OS processes: ./hello_user 8
charmrun> /usr/bin/setarch x86_64 -R mpirun -np 2 ./hello_user 8
Charm++> Running in non-SMP mode: 2 processes (PEs)
Converse/Charm++ Commit ID: 0e426ec
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Isomalloc> Synchronized global address space.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 2 cores x 1 PUs = 2-way SMP)
Charm++> cpu topology info is gathered in 0.000 seconds.
CharmLB> Load balancing instrumentation for communication is off.
Chare 0 created on PE 0
Chare 1 created on PE 0
Chare 2 created on PE 0
Chare 3 created on PE 0
Starting in user driven mode on 0
Chare 4 created on PE 1
Chare 5 created on PE 1
Chare 6 created on PE 1
Chare 7 created on PE 1
Starting in user driven mode on 1
Hello from chare 7
Hello from chare 6
Hello from chare 5
Hello from chare 4
Hello from chare 3
Hello from chare 2
Hello from chare 1
Hello from chare 0
Chare 3 got an ack from 0
Chare 2 got an ack from 0
Chare 1 got an ack from 0
Chare 0 got an ack from 0
Chare 7 got an ack from 0
Chare 3 got an ack from 1
Chare 2 got an ack from 1
Chare 1 got an ack from 1
Chare 0 got an ack from 1
Chare 6 got an ack from 0
Chare 5 got an ack from 0
Chare 4 got an ack from 0
Chare 7 got an ack from 1
Chare 6 got an ack from 1
Chare 5 got an ack from 1
Chare 4 got an ack from 1
[Partition 0][Node 0] End of program
Attempting to use an MPI routine after finalizing MPICH
make[3]: Leaving directory '/home/runner/work/charm/charm/mpi-linux-x86_64/examples/charm++/user-driven-interop'
MPI Linux SMP error:
../../../../bin/testrun +p4 ./zerocopy_with_qd 100 +noCMAForZC ++ppn 4 +setcpuaffinity
Running as 1 OS processes: ./zerocopy_with_qd 100 +noCMAForZC +ppn 4 +setcpuaffinity
charmrun> /usr/bin/setarch x86_64 -R mpirun -np 1 ./zerocopy_with_qd 100 +noCMAForZC +ppn 4 +setcpuaffinity
Charm++> Running on MPI library: MPICH Version: 3.3.2
MPICH Release date: Tue Nov 12 21:23:16 CST 2019
MPICH ABI: 13:8:1
MPICH Device: ch3:nemesis
MPICH configure: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --with-libfabric --enable-shared --prefix=/usr --enable-fortran=all --disable-rpath --disable-wrapper-rpath --sysconfdir=/etc/mpich --libdir=/usr/lib/x86_64-linux-gnu --includedir=/usr/include/x86_64-linux-gnu/mpich --docdir=/usr/share/doc/mpich CPPFLAGS= CFLAGS= CXXFLAGS= FFLAGS= FCFLAGS= BASH_SHELL=/bin/bash
MPICH CC: gcc -g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -O2
MPICH CXX: g++ -g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -O2
MPICH F77: f77 -g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -O2
MPICH FC: f95 -g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -cpp -O2
(MPI standard: 3.1)
Charm++> Level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: 1 processes, 4 worker threads (PEs) + 1 comm threads per process, 4 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: 0e426ec
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
Charm++> Running on 1 hosts (1 sockets x 2 cores x 1 PUs = 2-way SMP)
Charm++> cpu topology info is gathered in 0.025 seconds.
Charm++> Warning: Running with more SMP threads (5) than physical cores (2).
Use +CmiSleepOnIdle (default behavior) or +CmiSpinOnIdle to silence this message.
WARNING: Multiple PEs assigned to same core, recommend adjusting processor affinity or passing +CmiSleepOnIdle to reduce interference.
CharmLB> Load balancing instrumentation for communication is off.
[0][0][0] Test 1: QD has been reached for RO Variable Bcast
[0][0][0] Test 2: QD has been reached for Direct API
[0][0][0] Test 3: QD has been reached for EM Send API
[0][0][0] Test 4: QD has been reached for EM Post API
[0][0][0] Test 5: QD has been reached for EM Bcast Send API
------------- Processor 3 Exiting: Called CmiAbort ------------
Reason: [3] Assertion "elBcastNo > bcastEpoch" failed in file /home/runner/work/charm/charm/src/ck-core/ckarray.C line 1350.
[3] Stack Traceback:
[3:0] zerocopy_with_qd 0x5555557ba2db CmiAbortHelper(char const*, char const*, char const*, int, int)
[3:1] zerocopy_with_qd 0x5555557ba3ff
[3:2] zerocopy_with_qd 0x555555778180
[3:3] zerocopy_with_qd 0x5555556587ca CkArrayBroadcaster::attemptDelivery(CkArrayMessage*, ArrayElement*, bool)
[3:4] zerocopy_with_qd 0x55555566727b CkArray::recvBroadcast(CkMessage*)
[3:5] zerocopy_with_qd 0x55555564c125 CkDeliverMessageReadonly
[3:6] zerocopy_with_qd 0x55555565256e _processHandler(void*, CkCoreState*)
[3:7] zerocopy_with_qd 0x555555774ad5 CsdScheduleForever
[3:8] zerocopy_with_qd 0x555555774d35 CsdScheduler
[3:9] zerocopy_with_qd 0x5555557bced2
application called MPI_Abort(comm=0x84000000, 1) - process 0
[3:10] zerocopy_with_qd 0x5555557bcf0c
[3:11] libpthread.so.0 0x7ffff7f9c609
real 0m0.442s
[3:12] libc.so.6 0x7ffff7802133 clone
user 0m0.007s
sys 0m0.004s
make[5]: *** [Makefile:38: smptest] Error 1
make[4]: *** [Makefile:41: smptest-zerocopy_with_qd] Error 2
make[3]: *** [Makefile:82: smptest-zerocopy] Error 2
make[2]: *** [Makefile:58: test] Error 2
make[1]: *** [Makefile:34: test-charm++] Error 2
make: *** [Makefile.tests.common:39: test] Error 2
make[5]: Leaving directory '/home/runner/work/charm/charm/mpi-linux-x86_64-smp/tests/charm++/zerocopy/zerocopy_with_qd'
MPI Windows error:
../../../bin/testrun +p4 ./amr_1d_random
Running as 4 OS processes: ./amr_1d_random
charmrun> /c/Program Files/Microsoft MPI/Bin/mpiexec -n 4 ./amr_1d_random
Charm++> Running on MPI library: Microsoft MPI 8.0.12438.0 (MPI standard: 2.0)
Charm++> Level of thread support used: MPI_THREAD_SINGLE (desired: MPI_THREAD_SINGLE)
Charm++> Running in non-SMP mode: 4 processes (PEs)
Converse/Charm++ Commit ID: 0e426ec
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Isomalloc> Synchronized global address space.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 2 cores x 1 PUs = 2-way SMP)
Charm++> cpu topology info is gathered in 0.577 seconds.
CharmLB> Load balancing instrumentation for communication is off.
Element (L0,I0) created on (N0, C0)
Main is in phase initialize
Main is in phase check neighbors
Element (L0,I0) pinged on (N0, C0)
Element (L0,I0) pinged on (N0, C0)
Main is in phase check
Main is in phase check volume
------------
Iteration 1
------------
...
------------
Iteration 9
------------
Main is in phase evaluate refinement criteria
Element (L0,I0) evaluating refinement criteria on (N1, C1)
Element (L0,I0) updating AMR decision on (N1, C1)
Element (L0,I0) updating AMR decision on (N1, C1)
Main is in phase begin inserting
Main is in phase create new elements
Element (L0,I0) adjusting domain (split) on (N1, C1)
Element (L1,I1) created on (N3, C3)
Element (L1,I0) created on (N2, C2)
Main is in phase done inserting
Main is in phase count elements
Main is in phase adjust domain
Total elements = 3
Flag for 1 is 1
Element (L0,I0) sending data to children on (N1, C1)
1, Neighbors = 1, 1
Element (L1,I1) initializing child on (N3, C3)
Flag for 3 is -2
Element (L1,I0) initializing child on (N2, C2)
ERROR: Failed to post close command error 1726
Flag for 2 is -2
ERROR: unable to tear down the job tree. exiting...
Main is in phase delete old elements
Element (L0,I0) deleting on (N1, C1)
Main is in phase check neighbors
Element (L1,I0) pinged on (N2, C2)
Element (L1,I0) pinged on (N2, C2)
Element (L1,I1) pinged on (N3, C3)
Element (L1,I1) pinged on (N3, C3)
Main is in phase check
Main is in phase check volume
------------
real 0m1.107s
Iteration 10
user 0m0.015s
------------
sys 0m0.030s
make[2]: *** [Makefile:13: test] Error 127
Main is in phase evaluate refinement criteria
Element (L1,I1) evaluating refinement criteria on (N3, C3)
Element (L1,I0) updating AMR decision on (N2, C2)
make[1]: *** [Makefile:76: test-amr_1d_simple] Error 2
Element (L1,I0) updating AMR decision on (N2, C2)
Element (L1,I0) evaluating refinement criteria on (N2, C2)
Element (L1,I1) updating AMR decision on (N3, C3)
make: *** [Makefile:34: test-charm++] Error 2
Element (L1,I1) updating AMR decision on (N3, C3)
Main is in phase begin inserting
Main is in phase create new elements
Element (L1,I1) adjusting domain (join) on (N3, C3)
Element (L1,I0) adjusting domain (join) on (N2, C2)
Element (L0,I0) created on (N1, C1)
Main is in phase done inserting
Main is in phase count elements
Main is in phase adjust domain
Total elements = 3
Flag for 3 is -1
Element (L1,I1) adjusting domain (join) on (N3, C3)
Flag for 2 is -1
Element (L1,I0) adjusting domain (join) on (N2, C2)
Element (L1,I0) collecting data from children on (N2, C2)
Flag for 1 is -2
Element (L1,I1) collecting data from children on (N3, C3)
initialize_parent called
Element (L0,I0) initializing parent on (N1, C1)
Main is in phase delete old elements
Element (L1,I1) deleting on (N3, C3)
Element (L1,I0) deleting on (N2, C2)
Main is in phase check neighbors
Element (L0,I0) pinged on (N1, C1)
Element (L0,I0) pinged on (N1, C1)
Main is in phase check
Main is in phase check volume
Main is in phase exit
[Partition 0][Node 0] End of program
make[2]: Leaving directory '/d/a/charm/charm/mpi-win-x86_64/tests/charm++/amr_1d_random'
make -C amr_1d_simple test OPTS='-optimize -production' TESTOPTS=''
make[2]: Entering directory '/d/a/charm/charm/mpi-win-x86_64/tests/charm++/amr_1d_simple'
../../../bin/testrun +p2 ./amr_1d_simple
Running as 2 OS processes: ./amr_1d_simple
charmrun> /c/Program Files/Microsoft MPI/Bin/mpiexec -n 2 ./amr_1d_simple
job aborted:
[ranks] message
[0] process exited without calling init
[1] process exited without calling finalize
---- error analysis -----
[0] on fv-az449-247
./amr_1d_simple ended before calling init and may have crashed. exit code -1
[1] on fv-az449-247
./amr_1d_simple ended prematurely and may have crashed. exit code -1
---- error analysis -----
make[2]: Leaving directory '/d/a/charm/charm/mpi-win-x86_64/tests/charm++/amr_1d_simple'
MPI Windows SMP error:
make[3]: Entering directory '/d/a/charm/charm/mpi-win-x86_64-smp/tests/charm++/reductionTesting/reductionTesting3D'
../../../../bin/testrun ./reductionTesting3D +p4 20 20 20 5
Running as 4 OS processes: ./reductionTesting3D 20 20 20 5
charmrun> /c/Program Files/Microsoft MPI/Bin/mpiexec -n 4 ./reductionTesting3D 20 20 20 5
Charm++> Running on MPI library: Microsoft MPI 8.0.12438.0 (MPI standard: 2.0)
Charm++> Level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: 4 processes, 1 worker threads (PEs) + 1 comm threads per process, 4 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: 0e426ec
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Isomalloc> Synchronized global address space.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 2 cores x 1 PUs = 2-way SMP)
Charm++> cpu topology info is gathered in 1.624 seconds.
CharmLB> Load balancing instrumentation for communication is off.
reduced vector
0.000000 1.000000 2.000000 3.000000 4.000000
real 0m21.942s
reduced vector
user 0m0.015s
0.000000 8.000000 16.000000 24.000000 32.000000
sys 0m0.030s
reduced vector
0.000000 8.000000 16.000000 24.000000 32.000000
reduced vector
0.000000 8.000000 16.000000 24.000000 32.000000
reduced vector
0.000000 8.000000 16.000000 24.000000 32.000000
reduced vector
0.000000 8.000000 16.000000 24.000000 32.000000
reduced vector
0.000000 27.000000 54.000000 81.000000 108.000000
reduced vector
0.000000 27.000000 54.000000 81.000000 108.000000
reduced vector
0.000000 27.000000 54.000000 81.000000 108.000000
reduced vector
0.000000 8.000000 16.000000 24.000000 32.000000
reduced vector
0.000000 8.000000 16.000000 24.000000 32.000000
reduced vector
0.000000 8.000000 16.000000 24.000000 32.000000
reduced vector
0.000000 8.000000 16.000000 24.000000 32.000000
reduced vector
0.000000 8.000000 16.000000 24.000000 32.000000
reduced vector
0.000000 8000.000000 16000.000000 24000.000000 32000.000000
reduced vector
0.000000 1000.000000 2000.000000 3000.000000 4000.000000
reduced vector
0.000000 343.000000 686.000000 1029.000000 1372.000000
reduced vector
0.000000 125.000000 250.000000 375.000000 500.000000
reduced vector
0.000000 64.000000 128.000000 192.000000 256.000000
reduced vector
0.000000 64.000000 128.000000 192.000000 256.000000
[Partition 0][Node 0] End of program
make[3]: Leaving directory '/d/a/charm/charm/mpi-win-x86_64-smp/tests/charm++/reductionTesting/reductionTesting3D'
make[2]: Leaving directory '/d/a/charm/charm/mpi-win-x86_64-smp/tests/charm++/reductionTesting'
make -C partitions test OPTS='-optimize -production' TESTOPTS=''
make[2]: Entering directory '/d/a/charm/charm/mpi-win-x86_64-smp/tests/charm++/partitions'
../../../bin/testrun ./hello +p4 10 2 +partitions 2
Running as 4 OS processes: ./hello 10 2 +partitions 2
charmrun> /c/Program Files/Microsoft MPI/Bin/mpiexec -n 4 ./hello 10 2 +partitions 2
job aborted:
ERROR: Failed to post close command error 1726
ERROR: unable to tear down the job tree. exiting...
[ranks] message
[0-2] process exited without calling init
[3] process exited without calling finalize
---- error analysis -----
[0-2] on fv-az1113-610
./hello ended before calling init and may have crashed. exit code -1
[3] on fv-az1113-610
./hello ended prematurely and may have crashed. exit code -1
---- error analysis -----
make[2]: Leaving directory '/d/a/charm/charm/mpi-win-x86_64-smp/tests/charm++/partitions'
make[1]: Leaving directory '/d/a/charm/charm/mpi-win-x86_64-smp/tests/charm++'
make: Leaving directory '/d/a/charm/charm/mpi-win-x86_64-smp/tests'
real 0m0.117s
user 0m0.031s
sys 0m0.031s
MPI Windows SMP slowdown (abbreviated log):
2023-05-07T08:21:41.2339435Z ../../../../bin/testrun +p4 ./period_selection +balancer RotateLB +MetaLB +LBObjOnly
2023-05-07T08:21:41.3134376Z
2023-05-07T08:21:41.3135126Z Running as 4 OS processes: ./period_selection +balancer RotateLB +MetaLB +LBObjOnly
2023-05-07T08:21:41.3238298Z charmrun> /c/Program Files/Microsoft MPI/Bin/mpiexec -n 4 ./period_selection +balancer RotateLB +MetaLB +LBObjOnly
2023-05-07T08:22:07.2844811Z Charm++> Running on MPI library: Microsoft MPI 8.0.12438.0 (MPI standard: 2.0)
2023-05-07T08:22:16.7833924Z Charm++> Level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
2023-05-07T08:22:23.9179378Z Charm++> Running in SMP mode: 4 processes, 1 worker threads (PEs) + 1 comm threads per process, 4 PEs total
2023-05-07T08:22:26.2929986Z Charm++> The comm. thread both sends and receives messages
2023-05-07T08:22:33.4074292Z Converse/Charm++ Commit ID: 0e426ec
2023-05-07T08:22:40.5343812Z Charm++ built with internal error checking enabled.
2023-05-07T08:22:44.0980687Z Do not use for performance benchmarking (build without --enable-error-checking to do so).
2023-05-07T08:22:53.6035834Z Isomalloc> Synchronized global address space.
2023-05-07T08:23:03.0935287Z Warning: MetaLB is activated. For Automatic strategy selection in MetaLB, pass directory of model files using +MetaLBModelDir.
2023-05-07T08:23:05.4685121Z CharmLB> Load balancer ignores processor background load.
2023-05-07T08:23:12.5971461Z CharmLB> Load balancer assumes all CPUs are same.
2023-05-07T08:23:17.3497839Z LB> Load balancing strategy ignores non-migratable objects.
2023-05-07T08:23:23.2880334Z Charm++> Running on 1 hosts (1 sockets x 2 cores x 1 PUs = 2-way SMP)
2023-05-07T08:23:28.0323236Z Charm++> cpu topology info is gathered in 1.623 seconds.
2023-05-07T08:23:32.7858383Z CharmLB> Load balancing instrumentation for communication is off.
2023-05-07T08:23:38.7249474Z [0] TreeLB in LEGACY MODE support
2023-05-07T08:23:43.4766991Z [0] TreeLB: Using PE_Root tree with: Rotate
2023-05-07T08:23:47.0403033Z At PE 0 Total contribution for iteration 1 is 1 total objs 8
2023-05-07T08:23:56.5322964Z At PE 0 Total contribution for iteration 1 is 2 total objs 8
2023-05-07T08:23:57.7194648Z At PE 0 Total contribution for iteration 1 is 3 total objs 8
2023-05-07T08:24:02.4733223Z At PE 0 Total contribution for iteration 1 is 4 total objs 8
2023-05-07T08:24:09.6021532Z At PE 0 Total contribution for iteration 1 is 5 total objs 8
2023-05-07T08:24:14.3534106Z At PE 0 Total contribution for iteration 1 is 6 total objs 8
2023-05-07T08:24:19.1053053Z At PE 0 Total contribution for iteration 1 is 7 total objs 8
2023-05-07T08:24:22.6702149Z At PE 0 Total contribution for iteration 1 is 8 total objs 8
2023-05-07T08:48:59.9425197Z On iteration 11, migrations done: 1
2023-05-07T09:13:13.4551708Z On iteration 21, migrations done: 2
2023-05-07T09:39:05.5416746Z On iteration 31, migrations done: 3
2023-05-07T10:03:53.5079539Z On iteration 41, migrations done: 4
2023-05-07T10:28:02.8674691Z On iteration 51, migrations done: 5
2023-05-07T10:52:30.6499809Z On iteration 61, migrations done: 6
2023-05-07T11:15:24.6028320Z On iteration 71, migrations done: 7
2023-05-07T11:40:18.5041712Z On iteration 81, migrations done: 8
2023-05-07T12:04:05.9032489Z On iteration 91, migrations done: 9
2023-05-07T12:16:56.6664738Z At PE 3 Total contribution for iteration 7 is 1 total objs 8
2023-05-07T12:16:56.6665246Z At PE 3 Total contribution for iteration 7 is 2 total objs 8
2023-05-07T12:16:56.6666798Z At PE 3 Total contribution for iteration 7 is 3 total objs 8
2023-05-07T12:16:56.6667264Z At PE 3 Total contribution for iteration 7 is 4 total objs 8
2023-05-07T12:16:56.6669726Z At PE 3 Total contribution for iteration 7 is 5 total objs 8
2023-05-07T12:16:56.6670927Z At PE 3 Total contribution for iteration 7 is 6 total objs 8
2023-05-07T12:16:56.6671664Z At PE 3 Total contribution for iteration 7 is 7 total objs 8
2023-05-07T12:16:56.6672162Z At PE 3 Total contribution for iteration 7 is 8 total objs 8
2023-05-07T12:16:56.6672606Z [Partition 0][Node 0] End of program
2023-05-07T12:16:56.6609484Z real 235m15.372s
2023-05-07T12:16:56.6629438Z user 0m0.046s
2023-05-07T12:16:56.6633290Z sys 0m0.015s
2023-05-07T12:16:56.6674709Z make[3]: Leaving directory '/d/a/charm/charm/mpi-win-x86_64-smp/tests/charm++/load_balancing/meta_lb_test'
NetLRTS Darwin SMP, NetLRTS Linux error:
/Applications/Xcode_14.2.app/Contents/Developer/usr/bin/make -C periodic_lb_broadcast_test smptest OPTS='-g -Werror=vla -build-shared -optimize -production' TESTOPTS='++local'
../../../../bin/testrun +p2 ./ping +balancer RotateLB +LBPeriod 0.1 ++ppn 2 ++local
Charmrun> scalable start enabled.
Charmrun> started all node programs in 0.013 seconds.
Charm++> Running in SMP mode: 1 processes, 2 worker threads (PEs) + 1 comm threads per process, 2 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: 0e426ec
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 3 cores x 1 PUs = 3-way SMP)
Charm++> cpu topology info is gathered in 0.395 seconds.
[0] TreeLB in LEGACY MODE support
[0] TreeLB: Using PE_Root tree with: Rotate
Migrations done: 1
Main is in phase execute
Main is in phase check
------------- Processor 0 Exiting: Called CmiAbort ------------
[0] Stack Traceback:
Reason: [0] Assertion "migrations == migrationsRecordedByMain" failed in file ping.C line 132.
[0:0] ping 0x1001ee521 CmiAbort
[0:1] ping 0x1001a50d1 __CmiEnforceHelper
[0:2] ping 0x100006b66 Pingees::check(int)
[0:3] ping 0x1000069ca CkIndex_Pingees::_call_check_marshall3(void*, void*)
[0:4] ping 0x1000cf671 CkDeliverMessageReadonly
[0:5] ping 0x100117027 CkLocRec::invokeEntry(CkMigratable*, void*, int, bool)
[0:6] ping 0x1000df52b CkArrayBroadcaster::attemptDelivery(CkArrayMessage*, ArrayElement*, bool)
[0:7] ping 0x1000e1cf8 CkArray::recvBroadcast(CkMessage*)
[0:8] ping 0x1000cf51e CkDeliverMessageFree
[0:9] ping 0x1000d1df8 _processHandler(void*, CkCoreState*)
[0:10] ping 0x1001a1ebb CsdScheduleForever
[0:11] ping 0x1001a1b85 CsdScheduler
[0:12] ping 0x1001f26df ConverseRunPE(int)
[0:13] ping 0x1001f0b56 ConverseInit
[0:14] ping 0x100193afe charm_main
[0:15] ping 0x100001334 start
Fatal error on PE 0> [0] Assertion "migrations == migrationsRecordedByMain" failed in file ping.C line 132.
[0:16] 0x5
real 0m1.837s
user 0m0.002s
sys 0m0.005s
make[5]: *** [smptest] Error 1
make[4]: *** [smptest-periodic_lb_broadcast_test] Error 2
make[4]: Leaving directory '/home/runner/work/charm/charm/netlrts-linux-x86_64/tests/charm++/load_balancing/periodic_lb_broadcast_test'
NetLRTS Linux SMP slowdown:
2023-05-07T08:34:47.4737545Z ../../../bin/testrun ./pingpong_multipairs +p2 ++local +setcpuaffinity
2023-05-07T08:34:47.4738086Z
2023-05-07T08:34:47.4738253Z real 3m29.907s
2023-05-07T08:34:47.4738621Z user 0m0.015s
2023-05-07T08:34:47.4738870Z sys 0m0.000s
2023-05-07T08:34:47.4909279Z Charmrun> scalable start enabled.
2023-05-07T08:34:47.4909652Z Charmrun> started all node programs in 0.006 seconds.
2023-05-07T08:34:47.4910247Z Charm++> Running in SMP mode: 2 processes, 1 worker threads (PEs) + 1 comm threads per process, 2 PEs total
2023-05-07T08:34:47.4910628Z Charm++> The comm. thread both sends and receives messages
2023-05-07T08:34:47.4912970Z Converse/Charm++ Commit ID: 0e426ec
2023-05-07T08:34:47.4915840Z Charm++ built with internal error checking enabled.
2023-05-07T08:34:47.4916796Z Do not use for performance benchmarking (build without --enable-error-checking to do so).
2023-05-07T08:34:47.5141875Z Isomalloc> Synchronized global address space.
2023-05-07T08:34:47.5161470Z Charm++> scheduler running in netpoll mode.
2023-05-07T08:34:47.5161955Z Charm++> cpu affinity enabled.
2023-05-07T08:34:47.5784076Z Charm++> Running on 1 hosts (1 sockets x 2 cores x 1 PUs = 2-way SMP)
2023-05-07T08:34:47.6014436Z Charm++> cpu topology info is gathered in 0.057 seconds.
2023-05-07T08:34:47.6016910Z Multiple pair send/recv
2023-05-07T08:34:47.6017433Z bytes latency(us) bandwidth(MBytes/sec)
2023-05-07T08:37:06.9102835Z 8 17395.00 0.00
2023-05-07T08:39:27.3542751Z 16 34934.00 0.00
2023-05-07T08:41:46.7422750Z 32 52336.31 0.00
2023-05-07T08:44:05.4902966Z 64 69655.81 0.00
2023-05-07T08:46:25.8182137Z 128 87179.81 0.00
2023-05-07T08:48:46.2422634Z 256 104709.81 0.00
2023-05-07T08:51:04.8182151Z 512 122013.81 0.00
2023-05-07T08:53:23.7462008Z 1024 139360.31 0.01
2023-05-07T08:55:42.1462086Z 2048 156638.81 0.01
2023-05-07T08:58:01.1422135Z 4096 173995.31 0.02
2023-05-07T09:00:11.0623611Z 8192 190218.31 0.04
2023-05-07T09:02:29.6782349Z 16384 207534.81 0.08
2023-05-07T09:04:49.6782171Z 32768 225016.46 0.15
2023-05-07T09:07:12.1543042Z 65536 242810.46 0.27
2023-05-07T09:09:50.0743781Z 131072 262534.96 0.50
2023-05-07T09:12:32.9622330Z 262144 282871.96 0.93
2023-05-07T09:15:02.9982659Z 524288 301607.16 1.74
2023-05-07T09:17:48.8502817Z 1048576 322314.16 3.25
2023-05-07T09:20:29.5502198Z 2097152 342357.12 6.13
2023-05-07T09:23:56.4621839Z 4194304 368201.62 11.39
2023-05-07T09:23:56.4622589Z [Partition 0][Node 0] End of program
2023-05-07T09:23:56.4748429Z
2023-05-07T09:23:56.4749293Z make[3]: Leaving directory '/home/runner/work/charm/charm/netlrts-linux-x86_64-smp/benchmarks/converse/pingpong'
2023-05-07T09:23:56.4785536Z real 49m8.996s
2023-05-07T09:23:56.4786803Z user 0m0.085s
2023-05-07T09:23:56.4791723Z sys 0m0.091s