charm icon indicating copy to clipboard operation
charm copied to clipboard

CI: Link the runtime as shared objects

Open evan-charmworks opened this issue 4 years ago • 2 comments

evan-charmworks avatar Feb 26 '20 20:02 evan-charmworks

CI failures:

GitHub Linux

make[2]: Entering directory '/home/runner/work/charm/charm/uth-linux-x86_64/tests/ampi'
make -C megampi test OPTS='-optimize -production  -g -charm-shared -optimize -production  -g -charm-shared' TESTOPTS=''
make[3]: Entering directory '/home/runner/work/charm/charm/uth-linux-x86_64/tests/ampi/megampi'
../../../bin/testrun  ./pgm +p1 +vp1  
./pgm: error while loading shared libraries: libromio.so.0: cannot open shared object file: No such file or directory

real	0m0.005s
user	0m0.004s
sys	0m0.001s
make[3]: *** [test] Error 127
Makefile:18: recipe for target 'test' failed
make[2]: *** [test-megampi] Error 2
make[3]: Leaving directory '/home/runner/work/charm/charm/uth-linux-x86_64/tests/ampi/megampi'

CircleCI Linux NetLRTS

make -C ampi test OPTS='-optimize -production  -g -Werror=vla -charm-shared -optimize -production  -g -Werror=vla -charm-shared' TESTOPTS='++local +setcpuaffinity +CmiSleepOnIdle'
make[2]: Entering directory '/home/circleci/project/netlrts-linux-x86_64-smp/tests/ampi'
make -C megampi test OPTS='-optimize -production  -g -Werror=vla -charm-shared -optimize -production  -g -Werror=vla -charm-shared' TESTOPTS='++local +setcpuaffinity +CmiSleepOnIdle'
make[3]: Entering directory '/home/circleci/project/netlrts-linux-x86_64-smp/tests/ampi/megampi'
../../../bin/testrun  ./pgm +p1 +vp1  ++local +setcpuaffinity +CmiSleepOnIdle
Charmrun> scalable start enabled. 
Charmrun> Timeout waiting for node-program to connect

real	1m0.036s
user	0m0.003s
sys	0m0.000s
Makefile:18: recipe for target 'test' failed
make[3]: *** [test] Error 1
make[3]: Leaving directory '/home/circleci/project/netlrts-linux-x86_64-smp/tests/ampi/megampi'

Travis Linux MPI-SMP

make -C ampi test OPTS='-optimize -production  -g -Werror=vla -charm-shared -optimize -production  -g -Werror=vla -charm-shared' TESTOPTS='+setcpuaffinity'
make[2]: Entering directory '/home/travis/build/UIUC-PPL/charm/mpi-linux-x86_64-smp/tests/ampi'
make -C megampi test OPTS='-optimize -production  -g -Werror=vla -charm-shared -optimize -production  -g -Werror=vla -charm-shared' TESTOPTS='+setcpuaffinity'
make[3]: Entering directory '/home/travis/build/UIUC-PPL/charm/mpi-linux-x86_64-smp/tests/ampi/megampi'
../../../bin/testrun  ./pgm +p1 +vp1  +setcpuaffinity
Running on 1 processors:  ./pgm +vp1 +setcpuaffinity 
charmrun>  /usr/bin/setarch x86_64 -R  mpirun -np 1  ./pgm +vp1 +setcpuaffinity 
./pgm: error while loading shared libraries: libromio.so.0: cannot open shared object file: No such file or directory
real	0m0.016s
user	0m0.012s
sys	0m0.005s
Makefile:18: recipe for target 'test' failed
make[3]: *** [test] Error 127
make[3]: Leaving directory '/home/travis/build/UIUC-PPL/charm/mpi-linux-x86_64-smp/tests/ampi/megampi'

Travis macOS NetLRTS

  CCLD     libromio.la
clang: warning: -Wl,-install_name,/usr/local/lib/libromio.0.dylib: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: -Wl,-compatibility_version,1: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: -Wl,-current_version,1.0: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-dynamiclib' [-Wunused-command-line-argument]
duplicate symbol _MPI_Status_set_elements_x in:
    mpi-io/glue/.libs/large_count.o
    /Users/travis/build/UIUC-PPL/charm/netlrts-darwin-x86_64/lib/libmoduleampi.a(ampi.o)
duplicate symbol _MPI_Type_size_x in:
    mpi-io/glue/.libs/large_count.o
    /Users/travis/build/UIUC-PPL/charm/netlrts-darwin-x86_64/lib/libmoduleampi.a(ampi.o)
duplicate symbol _MPI_File_iwrite_at_all in:
    mpi-io/.libs/iwrite_atall.o
    /Users/travis/build/UIUC-PPL/charm/netlrts-darwin-x86_64/lib/libmoduleampi.a(ampi_noimpl.o)
duplicate symbol _MPI_File_iread_at_all in:
    mpi-io/.libs/iread_atall.o
    /Users/travis/build/UIUC-PPL/charm/netlrts-darwin-x86_64/lib/libmoduleampi.a(ampi_noimpl.o)
duplicate symbol _MPI_File_iwrite_all in:
    mpi-io/.libs/iwrite_all.o
    /Users/travis/build/UIUC-PPL/charm/netlrts-darwin-x86_64/lib/libmoduleampi.a(ampi_noimpl.o)
duplicate symbol _MPI_File_iread_all in:
    mpi-io/.libs/iread_all.o
    /Users/travis/build/UIUC-PPL/charm/netlrts-darwin-x86_64/lib/libmoduleampi.a(ampi_noimpl.o)
ld: 6 duplicate symbols for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

This PR does not yet try shared builds with CMake but they are broken there too due to hwloc.

evan-charmworks avatar Feb 27 '20 21:02 evan-charmworks

2020-11-30T22:22:32.1773060Z /Applications/Xcode_12.app/Contents/Developer/usr/bin/make -C megampi test OPTS='-g -charm-shared -build-shared -optimize -production' TESTOPTS=''
2020-11-30T22:22:32.1820560Z ../../../bin/testrun  ./megampi +p1 +vp1 +balancer RandCentLB 
2020-11-30T22:22:32.2276460Z Running command:  ./megampi  +p1 +vp1 +balancer RandCentLB
2020-11-30T22:22:32.2276950Z 
2020-11-30T22:22:32.6944450Z ------------- Processor 0 Exiting: Called CmiAbort ------------
2020-11-30T22:22:32.6945680Z Reason: TCHARM: Unexpected fallback setup--missing TCHARM_User_setup routine?
2020-11-30T22:22:32.6946370Z Charm++ fatal error:
2020-11-30T22:22:32.6947300Z TCHARM: Unexpected fallback setup--missing TCHARM_User_setup routine?
2020-11-30T22:22:32.6948220Z Charm++: standalone mode (not using charmrun)
2020-11-30T22:22:32.6948910Z Charm++> Running in Multicore mode: 1 threads (PEs)
2020-11-30T22:22:32.6949450Z Converse/Charm++ Commit ID: 03a7dd2
2020-11-30T22:22:32.6949940Z Charm++: Tracemode Projections enabled.
2020-11-30T22:22:32.6950420Z Trace: traceroot: ./megampi
2020-11-30T22:22:32.6950900Z CharmLB> Load balancer assumes all CPUs are same.
2020-11-30T22:22:32.6952700Z Charm++> Running on 1 hosts (2 sockets x 2 cores x 1 PUs = 4-way SMP)
2020-11-30T22:22:32.6953680Z Charm++> cpu topology info is gathered in 0.000 seconds.
2020-11-30T22:22:32.6954220Z [0] TreeLB in LEGACY MODE support
2020-11-30T22:22:32.6954720Z [0] TreeLB: Using PE_Root tree with: Random 
2020-11-30T22:22:32.6955140Z [0] Stack Traceback:
2020-11-30T22:22:32.6955630Z ../../../bin/testrun: line 62: 76041 Abort trap: 6           ./charmrun "$@"
2020-11-30T22:22:32.6955960Z 
2020-11-30T22:22:32.6956220Z real	0m0.504s
2020-11-30T22:22:32.6956500Z user	0m0.038s
2020-11-30T22:22:32.6956780Z sys	0m0.047s
2020-11-30T22:22:32.6957090Z make[3]: *** [test] Error 134
2020-11-30T22:22:32.6957550Z   [0:0] libconverse.dylib 0x101dad3a1 CmiAbort
2020-11-30T22:22:32.6958790Z   [0:1] libtcharm-compat.dylib 0x101f40521 TCHARM_Call_fallback_setup
2020-11-30T22:22:32.6960160Z   [0:2] libmoduletcharmmain.dylib 0x1017ce6fa CkIndex_TCharmMain::_call_TCharmMain_CkArgMsg(void*, void*)
2020-11-30T22:22:32.6960970Z   [0:3] libck.dylib 0x101ab3e28 _initCharm(int, char**)
2020-11-30T22:22:32.6961580Z   [0:4] libconverse.dylib 0x101db0c9e ConverseRunPE(int)
2020-11-30T22:22:32.6962210Z   [0:5] libconverse.dylib 0x101daf0c5 ConverseInit
2020-11-30T22:22:32.6962750Z   [0:6] libck.dylib 0x101ab4d8e charm_main
2020-11-30T22:22:32.6963170Z   [0:7] megampi 0x101607cd4 start
2020-11-30T22:22:32.6963540Z [0] Stack Traceback:
2020-11-30T22:22:32.6964050Z   [0:0] libconverse.dylib 0x101db1c05 charmrun_abort(char const*)
2020-11-30T22:22:32.6964690Z   [0:1] libconverse.dylib 0x101db14bd LrtsAbort(char const*)
2020-11-30T22:22:32.6965450Z   [0:2] libconverse.dylib 0x101db146a CmiAbortHelper(char const*, char const*, char const*, int, int)
2020-11-30T22:22:32.6966170Z   [0:3] libconverse.dylib 0x101dad3a1 CmiAbort
2020-11-30T22:22:32.6967340Z   [0:4] libtcharm-compat.dylib 0x101f40521 TCHARM_Call_fallback_setup
2020-11-30T22:22:32.6968350Z   [0:5] libmoduletcharmmain.dylib 0x1017ce6fa CkIndex_TCharmMain::_call_TCharmMain_CkArgMsg(void*, void*)
2020-11-30T22:22:32.6969160Z   [0:6] libck.dylib 0x101ab3e28 _initCharm(int, char**)
2020-11-30T22:22:32.6969760Z   [0:7] libconverse.dylib 0x101db0c9e ConverseRunPE(int)
2020-11-30T22:22:32.6970630Z   [0:8] libconverse.dylib 0x101daf0c5 ConverseInit
2020-11-30T22:22:32.6971210Z   [0:9] libck.dylib 0x101ab4d8e charm_main
2020-11-30T22:22:32.6971630Z   [0:10] megampi 0x101607cd4 start
2020-11-30T22:22:32.6972490Z make[2]: *** [test-megampi] Error 2
2020-11-30T22:22:32.6973230Z make[1]: *** [test-ampi] Error 2
2020-11-30T22:22:32.6973630Z make: *** [test] Error 2
2020-11-30T22:22:32.6982910Z ##[error]Process completed with exit code 2.
make -C megampi test OPTS='-g -Werror=vla -charm-shared -build-shared -optimize -production' TESTOPTS='+setcpuaffinity'
make[3]: Entering directory '/home/travis/build/UIUC-PPL/charm/mpi-linux-x86_64-smp/tests/ampi/megampi'
../../../bin/testrun  ./megampi +p1 +vp1 +balancer RandCentLB +setcpuaffinity

Running on 1 processors:  ./megampi +vp1 +balancer RandCentLB +setcpuaffinity 
charmrun>  /usr/bin/setarch x86_64 -R  mpirun -np 1  ./megampi +vp1 +balancer RandCentLB +setcpuaffinity 
Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: -1 (desired: 0)
Charm++> Running in SMP mode: 1 processes, 1 worker threads (PEs) + 1 comm threads per process, 1 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.11.0-beta1-41-ge2a2c44
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled. 
Charm++> Running on 1 hosts (1 sockets x 1 cores x 2 PUs = 2-way SMP)
Charm++> cpu topology info is gathered in 0.000 seconds.
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: [0] Assertion "cIdx == _entryTable[eIdx]->chareIdx" failed in file /home/travis/build/UIUC-PPL/charm/src/ck-core/ck.C line 837.

[0] Stack Traceback:
  [0:0] libconverse.so 0x7ffff607946a CmiAbortHelper(char const*, char const*, char const*, int, int)
  [0:1] libconverse.so 0x7ffff607958b 
  [0:2] libconverse.so 0x7ffff6024fac 
  [0:3] libck.so 0x7ffff6a97938 CkCreateGroup
  [0:4] libmoduleCommonLBs.so 0x7ffff71f1b39 CProxy_TreeLB::ckNew(CkLBOptions const&, CkEntryOptions const*)
  [0:5] libmoduleCommonLBs.so 0x7ffff71f1c1b CreateTreeLB(CkLBOptions const&)
  [0:6] libck.so 0x7ffff6a27a70 
  [0:7] libck.so 0x7ffff6a2a50a LBMgrInit::LBMgrInit(CkArgMsg*)
  [0:8] libck.so 0x7ffff6b2dc35 _initCharm(int, char**)
  [0:9] libconverse.so 0x7ffff607c176 
  [0:10] libconverse.so 0x7ffff607cc1c 
  [0:11] libpthread.so.0 0x7ffff5bc86ba 
  [0:12] libc.so.6 0x7ffff4f6041d clone
application called MPI_Abort(comm=0x84000000, 1) - process 0

real	0m0.034s
user	0m0.007s
sys	0m0.007s
Makefile:18: recipe for target 'test' failed
make[3]: *** [test] Error 1
make[3]: Leaving directory '/home/travis/build/UIUC-PPL/charm/mpi-linux-x86_64-smp/tests/ampi/megampi'
Applications/Xcode-9.4.1.app/Contents/Developer/usr/bin/make -C lb_test testp OPTS='-g -Werror=vla -charm-shared -build-shared -optimize -production' TESTOPTS='++local'
../../../../bin/testrun  +p2 ./lb_test $(( 25 * 2)) 100 10 40 10 1000 ring +balancer GreedyLB +LBDebug 1  ++local
Charmrun> scalable start enabled. 
Charmrun> started all node programs in 0.018 seconds.
Charm++> Running in non-SMP mode: 2 processes (PEs)
Converse/Charm++ Commit ID: v6.11.0-beta1-41-ge2a2c44
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Isomalloc> Synchronized global address space.
Charm++> scheduler running in netpoll mode.
CharmLB> Verbose level 1, load balancing period: -1 seconds
CharmLB> Load balancer assumes all CPUs are same.
traceprojections was off at initial time.
Charm++> Running on 1 hosts (2 sockets x 1 cores x 1 PUs = 2-way SMP)
Charm++> cpu topology info is gathered in 0.000 seconds.
[0] TreeLB in LEGACY MODE support
[0] TreeLB: Using PE_Root tree with: Greedy 
	Using 0 as root
	Test PE Speed: false
Running lb_test on 2 processors with 50 elements
Print every 10 steps
Sync every 40 steps
First node busywaits 10 usec; last node busywaits 1000 usec

Selecting Topology Ring
Generating topology 0 for 50 elements
[0] Total work/step = 0.014017 sec
calibrated iterations 20984068
TIME PER STEP	10	0.420948	0.094870
TIME PER STEP	20	0.492598	0.071651
TIME PER STEP	30	0.580561	0.087963
--------- Started LB step 0 ---------
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: [0] Assertion "obj_cnt == objs.size()" failed in file /Users/travis/build/UIUC-PPL/charm/src/ck-ldb/TreeLevel.h line 124.

[0] Stack Traceback:
  [0:0] libconverse.dylib 0x1005cbd41 CmiAbort
  [0:1] libconverse.dylib 0x10057a151 __cmi_assert
  [0:2] libmoduleCommonLBs.dylib 0x1000f52fc float LBStatsMsg_1::fill<TreeStrategy::Obj<1, false>, TreeStrategy::Proc<1, false, false> >(std::__1::vector<TreeLBMessage*, std::__1::allocator<TreeLBMessage*> >, std::__1::vector<TreeStrategy::Obj<1, false>, std::__1::allocator<TreeStrategy::Obj<1, false> > >&, std::__1::vector<TreeStrategy::Proc<1, false, false>, std::__1::allocator<TreeStrategy::Proc<1, false, false> > >&, LLBMigrateMsg*, std::__1::vector<int, std::__1::allocator<int> >&)
  [0:3] libmoduleCommonLBs.dylib 0x1000eddc7 StrategyWrapper<TreeStrategy::Obj<1, false>, TreeStrategy::Proc<1, false, false> >::prepStrategy(unsigned int, unsigned int, std::__1::vector<TreeLBMessage*, std::__1::allocator<TreeLBMessage*> >&, LLBMigrateMsg*)
  [0:4] libmoduleCommonLBs.dylib 0x1000e1d31 RootLevel::loadBalance(std::__1::unordered_map<int, std::__1::vector<std::__1::pair<int, int>, std::__1::allocator<std::__1::pair<int, int> > >, std::__1::hash<int>, std::__1::equal_to<int>, std::__1::allocator<std::__1::pair<int const, std::__1::vector<std::__1::pair<int, int>, std::__1::allocator<std::__1::pair<int, int> > > > > >&)
  [0:5] libmoduleCommonLBs.dylib 0x1000d063c TreeLB::loadBalanceSubtree(int)
  [0:6] libck.dylib 0x1002414aa CkDeliverMessageFree
  [0:7] libck.dylib 0x100243c77 _processHandler(void*, CkCoreState*)
  [0:8] libconverse.dylib 0x100578356 CsdScheduleForever
  [0:9] libconverse.dylib 0x100578165 CsdScheduler
  [0:10] libconverse.dylib 0x1005cd8c9 ConverseInit
  [0:11] libck.dylib 0x1002edc4e charm_main
  [0:12] lb_test 0x100001754 start
Fatal error on PE 0> [0] Assertion "obj_cnt == objs.size()" failed in file /Users/travis/build/UIUC-PPL/charm/src/ck-ldb/TreeLevel.h line 124.


real	0m0.706s
user	0m0.003s
sys	0m0.004s
make[4]: *** [testp] Error 1
make[3]: *** [testp-lb_test] Error 2
make[2]: *** [testp-load_balancing] Error 2
make[1]: *** [testp-charm++] Error 2
make: *** [testp] Error 2
The command "make -C netlrts-darwin-x86_64/tmp testp P=2 TESTOPTS="++local"" exited with 2.

evan-charmworks avatar Dec 01 '20 20:12 evan-charmworks