charm
charm copied to clipboard
CI: Enable MPI Linux SMP make testp with ++ppn
Followup from #2990. Some issues were previously documented at the following link, and I'm repeating them here. https://github.com/UIUC-PPL/charm/pull/2990#issuecomment-673744857
Our MPI Linux SMP build in particular appears to be buggy and I'm leaning toward disabling the
+p4 ++ppn 2
part of its CI for this PR to unblock it. NetLRTS SMP does not have the same issues and can keep the PPN CI.
- Something about the MPI machine layer with SMP is causing hangs in
examples/charm++/zerocopy/entry_method_api/{reg,prereg,unreg}/stencil3d
andtests/ampi/intercomm_coll
.- I once got this failure in
examples/charm++/zerocopy/entry_method_post_api/unreg/simpleZeroCopy
, but could not reproduce it:../../../../../../bin/testrun +p4 ./simpleZeroCopy $(( 4 * 10 )) +balancer GreedyLB ++ppn 2 +setcpuaffinity Running on 2 processors: ./simpleZeroCopy 40 +balancer GreedyLB +ppn 2 +setcpuaffinity charmrun> /usr/bin/setarch x86_64 -R mpirun -np 2 ./simpleZeroCopy 40 +balancer GreedyLB +ppn 2 +setcpuaffinity Charm++> Running on MPI version: 3.1 Charm++> level of thread support used: -1 (desired: 0) Charm++> Running in SMP mode: 2 processes, 2 worker threads (PEs) + 1 comm threads per process, 4 PEs total Charm++> The comm. thread both sends and receives messages Converse/Charm++ Commit ID: v6.11.0-devel-384-g86fa91abd Charm++ built with internal error checking enabled. Do not use for performance benchmarking (build without --enable-error-checking to do so). Charm++ built with internal error checking enabled. Do not use for performance benchmarking (build without --enable-error-checking to do so). Isomalloc> Synchronized global address space. CharmLB> Load balancer assumes all CPUs are same. Charm++> cpu affinity enabled. Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP) Charm++> cpu topology info is gathered in 0.024 seconds. [0] TreeLB in LEGACY MODE support [0] TreeLB: Using PE_Root tree with strategy Greedy send: completed [NemeanLion:20932] *** An error occurred in MPI_Irecv [NemeanLion:20932] *** reported by process [2524905473,1] [NemeanLion:20932] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0 [NemeanLion:20932] *** MPI_ERR_BUFFER: invalid buffer pointer [NemeanLion:20932] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [NemeanLion:20932] *** and potentially your MPI job) -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 DUP FROM 0 with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- ------------- Processor 4 Exiting: Caught Signal ------------ Reason: Terminated [4] Stack Traceback: [4:0] simpleZeroCopy 0x5555557425fb [4:1] libpthread.so.0 0x7ffff7bc88a0 [4:2] mca_btl_sm.so 0x7fffe5b03d22 mca_btl_sm_component_progress [4:3] libopen-pal.so.20 0x7ffff625fabc opal_progress [4:4] mca_pml_ob1.so 0x7fffe56da903 mca_pml_ob1_iprobe [4:5] libmpi.so.20 0x7ffff716e31a PMPI_Iprobe [4:6] simpleZeroCopy 0x555555740d7b [4:7] simpleZeroCopy 0x555555741519 LrtsAdvanceCommunication(int) [4:8] simpleZeroCopy 0x5555557417ce CommunicationServerThread(int) [4:9] simpleZeroCopy 0x555555741b8a [4:10] simpleZeroCopy 0x555555742194 ConverseInit [4:11] simpleZeroCopy 0x555555722efc charm_main [4:12] libc.so.6 0x7ffff6794b97 __libc_start_main [4:13] simpleZeroCopy 0x55555562800a _start real 0m0.311s user 0m0.365s sys 0m0.082s
- Also this failure in megatest:
make[3]: Entering directory '/home/evan/Charmworks/charm/mpi-linux-x86_64-smp/tests/charm++/megatest' ../../../bin/testrun ./pgm +p4 ++ppn 2 +setcpuaffinity Running on 2 processors: ./pgm +ppn 2 +setcpuaffinity charmrun> /usr/bin/setarch x86_64 -R mpirun -np 2 ./pgm +ppn 2 +setcpuaffinity Charm++> Running on MPI version: 3.1 Charm++> level of thread support used: -1 (desired: 0) Charm++> Running in SMP mode: 2 processes, 2 worker threads (PEs) + 1 comm threads per process, 4 PEs total Charm++> The comm. thread both sends and receives messages Converse/Charm++ Commit ID: v6.11.0-devel-384-g86fa91abd Charm++ built with internal error checking enabled. Do not use for performance benchmarking (build without --enable-error-checking to do so). Isomalloc> Synchronized global address space. CharmLB> Load balancer assumes all CPUs are same. Charm++> cpu affinity enabled. Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP) Charm++> cpu topology info is gathered in 0.020 seconds. Megatest is running on 2 nodes 4 processors. test 0: initiated [groupring (milind)] test 0: completed (0.12 sec) test 1: initiated [nodering (milind)] test 1: completed (0.00 sec) test 2: initiated [varsizetest (mjlang)] test 2: completed (0.00 sec) test 3: initiated [varsizetest2 (phil)] test 3: completed (0.00 sec) test 4: initiated [varraystest (milind)] test 4: completed (0.00 sec) test 5: initiated [groupcast (mjlang)] test 5: completed (0.00 sec) test 6: initiated [groupmulti (gengbin)] test 6: completed (0.00 sec) test 7: initiated [groupsectiontest (ebohm)] test 7: completed (0.00 sec) test 8: initiated [multisectiontest (ebohm)] test 8: completed (0.03 sec) test 9: initiated [nodecast (milind)] test 9: completed (0.00 sec) test 10: initiated [synctest (mjlang)] test 10: completed (0.01 sec) test 11: initiated [fib (jackie)] test 11: completed (0.01 sec) test 12: initiated [arrayring (fang)] test 12: completed (0.00 sec) test 13: initiated [packtest (fang)] test 13: completed (0.00 sec) test 14: initiated [queens (jackie)] test 14: completed (0.01 sec) test 15: initiated [migration (jackie)] test 15: completed (0.00 sec) test 16: initiated [marshall (olawlor)] -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 DUP FROM 0 with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- test 16: completed (0.32 sec) test 17: initiated [priomsg (fang)] test 17: completed (0.00 sec) test 18: initiated [priotest (mlind)] test 18: completed (0.03 sec) test 19: initiated [rotest (milind)] test 19: completed (0.01 sec) test 20: initiated [statistics (olawlor)] test 20: completed (0.00 sec) test 21: initiated [templates (milind)] test 21: completed (0.00 sec) test 22: initiated [inherit (olawlor)] test 22: completed (0.00 sec) test 23: initiated [reduction (olawlor)] test 23: completed (0.00 sec) test 24: initiated [bitvector (jbooth)] test 24: completed (0.00 sec) test 25: initiated [immediatering (gengbin)] test 25: completed (0.00 sec) test 26: initiated [callback (olawlor)] test 26: completed (0.00 sec) test 27: initiated [inlineem (phil)] test 27: completed (0.00 sec) test 28: initiated [completion_test (phil)] Starting test Created detector, starting first detection Started first test Finished second test Started third test test 28: completed (0.00 sec) test 29: initiated [groupdependence (nbhat4)] test 29: completed (0.00 sec) test 30: initiated [multi groupring (milind)] test 30: completed (0.00 sec) test 31: initiated [multi nodering (milind)] test 31: completed (0.00 sec) test 32: initiated [multi varsizetest (mjlang)] test 32: completed (0.00 sec) test 33: initiated [multi varsizetest2 (phil)] test 33: completed (0.00 sec) test 34: initiated [multi varraystest (milind)] test 34: completed (0.00 sec) test 35: initiated [multi groupcast (mjlang)] test 35: completed (0.00 sec) test 36: initiated [multi groupmulti (gengbin)] test 36: completed (0.00 sec) test 37: initiated [multi groupsectiontest (ebohm)] test 37: completed (0.00 sec) test 38: initiated [multi multisectiontest (ebohm)] test 38: completed (0.11 sec) test 39: initiated [multi nodecast (milind)] test 39: completed (0.05 sec) test 40: initiated [multi synctest (mjlang)] [4] Stack Traceback: [4:0] pgm 0x555555812acb [4:1] libpthread.so.0 0x7ffff7bc88a0 [4:2] mca_pml_ob1.so 0x7fffe56ebab4 mca_pml_ob1_recv_req_start [4:3] mca_pml_ob1.so 0x7fffe56da8e5 mca_pml_ob1_iprobe [4:4] libmpi.so.20 0x7ffff716e31a PMPI_Iprobe [4:5] pgm 0x55555581124b [4:6] pgm 0x5555558119e9 LrtsAdvanceCommunication(int) [4:7] pgm 0x555555811c9e CommunicationServerThread(int) [4:8] pgm 0x55555581205a [4:9] pgm 0x555555812664 ConverseInit [4:10] pgm 0x5555557a2bfc charm_main [4:11] libc.so.6 0x7ffff6794b97 __libc_start_main [4:12] pgm 0x55555568bb6a _start [5] Stack Traceback: [5:0] pgm 0x555555812acb [5:1] libpthread.so.0 0x7ffff7bc88a0 [5:2] libopen-pal.so.20 0x7ffff62bb093 [5:3] libopen-pal.so.20 0x7ffff625f9a9 opal_progress [5:4] mca_pml_ob1.so 0x7fffe56da903 mca_pml_ob1_iprobe [5:5] libmpi.so.20 0x7ffff716e31a PMPI_Iprobe [5:6] pgm 0x55555581124b [5:7] pgm 0x5555558119e9 LrtsAdvanceCommunication(int) [5:8] pgm 0x555555811c9e CommunicationServerThread(int) [5:9] pgm 0x55555581205a [5:10] pgm 0x555555812664 ConverseInit [5:11] pgm 0x5555557a2bfc charm_main [5:12] libc.so.6 0x7ffff6794b97 __libc_start_main [5:13] pgm 0x55555568bb6a _start ------------- Processor 4 Exiting: Caught Signal ------------ Reason: Terminated ------------- Processor 5 Exiting: Caught Signal ------------ Reason: Terminated [NemeanLion:22516] 1 more process has sent help message help-mpi-api.txt / mpi-abort [NemeanLion:22516] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages real 0m1.630s user 0m3.254s sys 0m0.046s
- As an additional note on performance, pingpong_multipairs has an aberration in speed between iterations of size <= 8192 and >= 16384.
make[3]: Entering directory '/home/evan/Charmworks/charm/mpi-linux-x86_64-smp/benchmarks/converse/pingpong' ../../../bin/testrun ./pingpong_multipairs +p4 ++ppn 2 +setcpuaffinity Running on 2 processors: ./pingpong_multipairs +ppn 2 +setcpuaffinity charmrun> /usr/bin/setarch x86_64 -R mpirun -np 2 ./pingpong_multipairs +ppn 2 +setcpuaffinity Charm++> Running on MPI version: 3.1 Charm++> level of thread support used: -1 (desired: 0) Charm++> Running in SMP mode: 2 processes, 2 worker threads (PEs) + 1 comm threads per process, 4 PEs total Charm++> The comm. thread both sends and receives messages Converse/Charm++ Commit ID: v6.11.0-devel-384-g86fa91abd Converse/Charm++ Commit ID: v6.11.0-devel-384-g86fa91abd Charm++ built with internal error checking enabled. Do not use for performance benchmarking (build without --enable-error-checking to do so). Isomalloc> Synchronized global address space. Charm++> cpu affinity enabled. Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP) Charm++> cpu topology info is gathered in 0.028 seconds. Multiple pair send/recv bytes latency(us) bandwidth(MBytes/sec) 8 2.65 3.01 16 6.00 2.67 32 11.59 2.76 64 18.26 3.50 128 23.41 5.47 256 28.73 8.91 512 32.51 15.75 1024 37.44 27.35 2048 42.51 48.18 4096 60.71 67.47 8192 80.76 101.44 16384 2512.18 6.52 32768 6642.26 4.93 65536 6679.92 9.81 131072 6749.01 19.42 262144 6810.27 38.49 524288 7113.52 73.70 1048576 7505.03 139.72 2097152 8142.08 257.57 4194304 9566.39 438.44 [Partition 0][Node 0] End of program real 2m16.499s user 11m10.740s sys 0m15.628s
New failure:
../../../bin/testrun ./pgm +p4 +vp$(( 4 * 4 )) +setcpuaffinity ++ppn 2
Running on 2 processors: ./pgm +vp16 +setcpuaffinity +ppn 2
charmrun> /usr/bin/setarch x86_64 -R mpirun -np 2 ./pgm +vp16 +setcpuaffinity +ppn 2
Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: -1 (desired: 0)
Charm++> Running in SMP mode: 2 processes, 2 worker threads (PEs) + 1 comm threads per process, 4 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: 12a5fa1
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Isomalloc> Synchronized global address space.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
Charm++> Running on 1 hosts (1 sockets x 1 cores x 2 PUs = 2-way SMP)
Charm++> cpu topology info is gathered in 0.098 seconds.
Charm++> Warning: Running with more SMP threads (6) than physical cores (2).
Use +CmiSleepOnIdle (default behavior) or +CmiSpinOnIdle to silence this message.
WARNING: Multiple PEs assigned to same core, recommend adjusting processor affinity or passing +CmiSleepOnIdle to reduce interference.
[0] No TreeLB configuration file found. Choosing a default configuration.
[0] TreeLB: Using PE_Process_Root tree
[0] Testing: Testing...
[2] migrated from 0 to 1
[0] Testing: Testing...
[5] migrated from 1 to 0
[0] Testing: Testing...
[10] migrated from 2 to 1
[14] migrated from 3 to 2
[3] migrated from 0 to 3
[0] Testing: Testing...
Recv'ed bad value for large message send/recv
[2] Stack Traceback:
------------- Processor 2 Exiting: Called CmiAbort ------------
Reason: AMPI: Application called MPI_Abort()!
[2:0] pgm 0x65c71a CmiAbortHelper(char const*, char const*, char const*, int, int)
[2:1] pgm 0x65c83b
[2:2] pgm 0x4f96cb APMPI_Abort
[2:3] pgm 0x4dee85 MPI_Tester::test()
[2:4] pgm 0x4df9c6 AMPI_Main_cpp
[2:5] pgm 0x4f5e0a AMPI_threadstart
[2:6] pgm 0x4e61d3
[2:7] pgm 0x65aff5 CthStartThread
[2:8] pgm 0x65b2cf make_fcontext
application called MPI_Abort(comm=0x84000002, 1) - process 1
real 0m0.313s
user 0m0.011s
sys 0m0.004s
Makefile:26: recipe for target 'testp' failed
make[3]: *** [testp] Error 1
make[3]: Leaving directory '/home/travis/build/UIUC-PPL/charm/mpi-linux-x86_64-smp/tests/ampi/megampi'