LD_PRELOAD seg fault
I followed the instructions for fixing the LD_PRELOAD problem by building malt with -DLIBUNWIND_PREFIX=/path/to/libunwind. But I continue to get a segfault when running and it occurs after the app finishes running. If I try to add the malt runtime option '-s libunwind', I get the LD_PRELOAD segfault immediately. Is there something else I can try?
Some more information. I modified one of the tests, TestSimpleStackTracker, by parallelizing it using MPI_Init/MPI_Finalize. I then rebuilt Malt using mpicc/mpic++ and the Cray MPICH version 8.1.25. I tested this on one of our Cray clusters that uses Slurm resource manager. I ran with 'srun -n 1 /path/to/malt --mpi ./src/lib/tests/TestSimpleStackTracker'. I then get this error
TestSimpleStackTracker: /users/rta/workspace/malt/src/lib/common/SimpleAllocator.cpp:179: void MALT::SimpleAllocator::free(void *): Assertion `unusedMemory <= totalMemory' failed. /usr/projects/perfeng/utils/malt/ro/rta/bin/malt: line 458: 134989 Aborted (core dumped) LD_PRELOAD="${MPI_WRAPPER_DIR}/libmaltmpi.so:${MALT_LIB}:${LD_PRELOAD}" "$@"
Hello, thanks for reporting the issue.
I would ask two things to help debugging:
- Can you extract the values of
unusedMemoryandtotalMemoryto see if one of them is 0 or totally wrong value ? - For the segfault, if in case you can get a core dump to know where it appears, at least with symbol name or better source line ?
For 1., I see
unusedMemory = 262328, totalMemory = 262144
Let me work on answering 2.
On Dec 21, 2023, at 12:16 PM, Sébastien Valat @.@.>> wrote:
Hello, thanks for reporting the issue.
I would ask two things to help debugging:
- Can you extract the values of unusedMemory and totalMemory to see if one of them is 0 or totally wrong value ?
- For the segfault, if in case you can get a core dump to know where it appears, at least with symbol name or better source line ?
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/memtt/malt/issues/86*issuecomment-1866811652__;Iw!!Bt8fGhp8LhKGRg!EJ2cP3vvUQN7QwKE1tWxC0H7vJs5K-oC-uKEi0slsODV_leC_y-RNhkfAnTMJXqGwAPzOzkdeamykdLvFznQsQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AB46TOXLNHB7KIGNJI4QR33YKSDI5AVCNFSM6AAAAABA3T2WBCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRWHAYTCNRVGI__;!!Bt8fGhp8LhKGRg!EJ2cP3vvUQN7QwKE1tWxC0H7vJs5K-oC-uKEi0slsODV_leC_y-RNhkfAnTMJXqGwAPzOzkdeamykdJuKSOoqQ$. You are receiving this because you authored the thread.Message ID: @.***>
Here is the backtrace I get from a core dump:
#0 0x000014ceba351cbb in raise () from /lib64/libc.so.6 #1 0x000014ceba353355 in abort () from /lib64/libc.so.6 #2 0x000014ceba349cba in __assert_fail_base () from /lib64/libc.so.6 #3 0x000014ceba349d42 in __assert_fail () from /lib64/libc.so.6 #4 0x000014cebc2bad75 in MALT::SimpleAllocator::free(void*) () from /usr/projects/perfeng/utils/malt/ro/rta/lib64/libmalt.so #5 0x000014cebc2b4cf0 in std::_Rb_tree<MALT::StackSTLHashMapMALT::CallStackInfo::Key, std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo>, std::_Select1st<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> >, std::less<MALT::StackSTLHashMapMALT::CallStackInfo::Key>, MALT::STLInternalAllocator<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> > >::_M_erase(std::_Rb_tree_node<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> >) () from /usr/projects/perfeng/utils/malt/ro/rta/lib64/libmalt.so #6 0x000014cebc2b4ce1 in std::_Rb_tree<MALT::StackSTLHashMapMALT::CallStackInfo::Key, std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo>, std::_Select1st<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> >, std::less<MALT::StackSTLHashMapMALT::CallStackInfo::Key>, MALT::STLInternalAllocator<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> > >::_M_erase(std::_Rb_tree_node<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> >) () from /usr/projects/perfeng/utils/malt/ro/rta/lib64/libmalt.so #7 0x000014cebc2b4ce1 in std::_Rb_tree<MALT::StackSTLHashMapMALT::CallStackInfo::Key, std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo>, std::_Select1st<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> >, std::less<MALT::StackSTLHashMapMALT::CallStackInfo::Key>, MALT::STLInternalAllocator<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> > >::_M_erase(std::_Rb_tree_node<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> >) () from /usr/projects/perfeng/utils/malt/ro/rta/lib64/libmalt.so #8 0x000014cebc2b4ce1 in std::_Rb_tree<MALT::StackSTLHashMapMALT::CallStackInfo::Key, std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo>, std::_Select1st<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> >, std::less<MALT::StackSTLHashMapMALT::CallStackInfo::Key>, MALT::STLInternalAllocator<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> > >::_M_erase(std::_Rb_tree_node<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> >) () from /usr/projects/perfeng/utils/malt/ro/rta/lib64/libmalt.so #9 0x000014cebc2b4ce1 in std::_Rb_tree<MALT::StackSTLHashMapMALT::CallStackInfo::Key, std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo>, std::_Select1st<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> >, std::less<MALT::StackSTLHashMapMALT::CallStackInfo::Key>, MALT::STLInternalAllocator<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> > >::_M_erase(std::_Rb_tree_node<std::pair<MALT::StackSTLHashMapMALT::CallStackInfo::Key const, MALT::CallStackInfo> >*) () from /usr/projects/perfeng/utils/malt/ro/rta/lib64/libmalt.so #10 0x000014cebc2c0972 in AllocWrapperGlobal::onExit() () from /usr/projects/perfeng/utils/malt/ro/rta/lib64/libmalt.so #11 0x000014cebc5c4743 in _dl_fini () from /lib64/ld-linux-x86-64.so.2 #12 0x000014ceba354ae9 in __run_exit_handlers () from /lib64/libc.so.6 #13 0x000014ceba354c7a in exit () from /lib64/libc.so.6 #14 0x000014ceba33c2a4 in __libc_start_main () from /lib64/libc.so.6 #15 0x00000000004175fa in _start () at ../sysdeps/x86_64/start.S:120
On Dec 21, 2023, at 12:16 PM, Sébastien Valat @.@.>> wrote:
Hello, thanks for reporting the issue.
I would ask two things to help debugging:
- Can you extract the values of unusedMemory and totalMemory to see if one of them is 0 or totally wrong value ?
- For the segfault, if in case you can get a core dump to know where it appears, at least with symbol name or better source line ?
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/memtt/malt/issues/86*issuecomment-1866811652__;Iw!!Bt8fGhp8LhKGRg!EJ2cP3vvUQN7QwKE1tWxC0H7vJs5K-oC-uKEi0slsODV_leC_y-RNhkfAnTMJXqGwAPzOzkdeamykdLvFznQsQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AB46TOXLNHB7KIGNJI4QR33YKSDI5AVCNFSM6AAAAABA3T2WBCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRWHAYTCNRVGI__;!!Bt8fGhp8LhKGRg!EJ2cP3vvUQN7QwKE1tWxC0H7vJs5K-oC-uKEi0slsODV_leC_y-RNhkfAnTMJXqGwAPzOzkdeamykdJuKSOoqQ$. You are receiving this because you authored the thread.Message ID: @.***>
Hi, I'm continuing to get this LD_PRELOAD segfault, but not on all apps that I run with malt. Is there anything else I can try?
I think I've found the problem. One of the apps we're profiling is built with Intel compilers. When I built malt with icx/icpx using Intel 2021 compilers, then I don't get the segfault. For the other apps, I used gcc 10.
Hi, sorry didn't has yet time to investigate.
But as you pointed, there could be a problem due to mix of C++ libraries (intel / gnu).
Have you tried to also compile MALT with icpc so everything is under intel (malt & the app) ?
Yes, I compiled a version using Intel icpx/icx and that worked. thank you!
Hum, thanks very much for the reporting, that's good to know.
I had the impression up to now that there was no issue in that case, but apparently yes.