charm
charm copied to clipboard
MSA Examples Failing
Two of the MSA examples are broken.
examples/multiphaseSharedArrays/matmul does not compile. After superficial fixes, it will crash with:
Running as 1 OS processes: t2d 2 1048576 100 500 100 1
charmrun> /usr/bin/setarch x86_64 -R mpirun -np 1 t2d 2 1048576 100 500 100 1
Charm++> Running in non-SMP mode: 1 processes (PEs)
Converse/Charm++ Commit ID: v7.1.0-devel-132-g2d58c2fb7
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP)
Charm++> cpu topology info is gathered in 0.102 seconds.
[cordelia:160910:0:160910] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2e0)
1 100 500 500 100 2 1048576 U 0.047026 5000 1 cordelia.local
==== backtrace (tid: 160910) ====
0 /home/szaday2/workspace/ucx/build/lib/libucs.so.0(ucs_handle_error+0x2e4) [0x7ffff7dae534]
1 /home/szaday2/workspace/ucx/build/lib/libucs.so.0(+0x2d76f) [0x7ffff7dae76f]
2 /home/szaday2/workspace/ucx/build/lib/libucs.so.0(+0x2da56) [0x7ffff7daea56]
3 /lib/x86_64-linux-gnu/libc.so.6(+0x46520) [0x7ffff784e520]
4 t2d(_ZN14MSA_CacheGroupId12DefaultEntryIdLb0EELj5000EE10accessPageEj16MSA_Page_Fault_t+0x1a) [0x4b638a]
5 t2d(_ZN17CkIndex_TestArray22_callthr_Kontinue_voidEP12CkThrCallArg+0x3f8) [0x4ac5f8]
6 t2d(CthStartThread+0x12) [0x5e68e2]
7 t2d(make_fcontext+0x2f) [0x5e6d5f]
=================================
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node cordelia exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
real 0m2.385s
user 0m0.077s
sys 0m0.043s
make: *** [Makefile:52: test] Error 139
At the time of the failure, the state of the cache group (MSA_CacheGroup::pageTable in particular) seems to be invalid.
Likewise, examples/multiphaseSharedArrays/moldyn does not compile. After superficial fixes, it will hang.
How do you even build the msa library along with LIBS? Do you put them in quotes with the build script: ./build "target1 target2 ..." as in ./build "LIBS msa"?
I am unsure about how to compile MSA with the build script.
I typically run make from src/libs/ck-libs/multiphaseSharedArrays/ to make -module msa available.