dyninst
dyninst copied to clipboard
Crash in __libdw_findcu when using OpenMP to parse symtab.
When processing the symtab for /lib64/libxmlrpc_util.so.4.51 from xmlrpc-c-1.51.0-11.fc33.x86_64 allowing OpenMP to use all the cores I get the following crash:
0x00007ffff776f0b7 in __libdw_findcu (dbg=0x7fffff7fefb0, start=906, v4_debug_types=false) at libdw_findcu.c:236
236 struct Dwarf_CU fake = { .start = start, .end = 0 };
#0 0x00007ffff776f0b7 in __libdw_findcu (dbg=0x7fffff7fefb0, start=906, v4_debug_types=false) at libdw_findcu.c:236
#1 0x00007ffff775f9de in __libdw_offdie (dbg=<optimized out>, offset=<optimized out>, result=0x7fffff7ff1a0, debug_types=<optimized out>) at dwarf_offdie.c:61
#2 0x00007ffff7f4bea5 in Dyninst::SymtabAPI::DwarfParseActions::entry (this=0x7fffffffc760) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.h:45
#3 Dyninst::SymtabAPI::DwarfWalker::parseSubprogram (this=0x7fffffffc760, func_type=Dyninst::SymtabAPI::DwarfWalker::InlinedFunc) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.C:590
#4 0x00007ffff7f46f14 in Dyninst::SymtabAPI::DwarfWalker::parse_int (this=0x7fffffffc760, e=..., p=false) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.C:365
#5 0x00007ffff7f4c82a in Dyninst::SymtabAPI::DwarfWalker::parseSubprogram (this=0x7fffffffc760, func_type=<optimized out>) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.h:130
#6 0x00007ffff7f46f14 in Dyninst::SymtabAPI::DwarfWalker::parse_int (this=0x7fffffffc760, e=..., p=false) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.C:365
#7 0x00007ffff7f4c82a in Dyninst::SymtabAPI::DwarfWalker::parseSubprogram (this=0x7fffffffc760, func_type=<optimized out>) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.h:130
#8 0x00007ffff7f46f14 in Dyninst::SymtabAPI::DwarfWalker::parse_int (this=0x7fffffffc760, e=..., p=false) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.C:365
#9 0x00007ffff7f4c82a in Dyninst::SymtabAPI::DwarfWalker::parseSubprogram (this=0x7fffffffc760, func_type=<optimized out>) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.h:130
#10 0x00007ffff7f46f14 in Dyninst::SymtabAPI::DwarfWalker::parse_int (this=0x7fffffffc760, e=..., p=false) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.C:365
#11 0x00007ffff7f4c82a in Dyninst::SymtabAPI::DwarfWalker::parseSubprogram (this=0x7fffffffc760, func_type=<optimized out>) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.h:130
#12 0x00007ffff7f46f14 in Dyninst::SymtabAPI::DwarfWalker::parse_int (this=0x7fffffffc760, e=..., p=false) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.C:365
#13 0x00007ffff7f4c82a in Dyninst::SymtabAPI::DwarfWalker::parseSubprogram (this=0x7fffffffc760, func_type=<optimized out>) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.h:130
#14 0x00007ffff7f46f14 in Dyninst::SymtabAPI::DwarfWalker::parse_int (this=0x7fffffffc760, e=..., p=false) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.C:365
#15 0x00007ffff7f4c82a in Dyninst::SymtabAPI::DwarfWalker::parseSubprogram (this=0x7fffffffc760, func_type=<optimized out>) at /usr/src/debug/dyninst-10.2.1-1.fc33.x86_64/dyninst-10.2.1/symtabAPI/src/dwarfWalker.h:130
This crash only seems to happen when OMP_NUM_THREADS is not set to one. I'm running it on a 12 thread 6 core machine. This crash is very common. It makes up 150/471 crashes when running on my machine.
@woodard Are you using OMP_NUM_THREADS=12
or unset OMP_NUM_THREADS
? There is a known issue with Dyninst that not setting OMP_NUM_THREADS
will cause substantial oversubscription due to nested parallelism that can lead to crashes.
@hainest Unset OMP_NUM_THREADS. Your quick description doesn't quite make sense to me. When I was running these tests, I didn't see anything like oversubscription. The GOMP runtime queued things up reasonably nicely. Can you point me at that known issue? I'll read the analysis in that and see if it makes more sense to me.
What I think I've been seeing with these bugs is more along the lines of two threads are trying to process the file and one thread is building a tree of types or some other symtab element to fill in the tree. For some reason, it pauses like maybe a disk read or something, then another part of the symtab parsing also finds its way into that part of the symtab tree and it trys to access an element which hasn't been fully filled in yet due to the pause and it accesses one of the elements that hasn't been filled in yet. Thus I thought that the resolution to a whole bunch of problems was likely going to be a condition variable that other threads hang on when they reach a symtab element which haven't been fully processed yet. If that suspicion is correct, it isn't an oversubscription of GOMP threads but more a synchronization between the threads in relation to the work that they need to do. In that case, you may not see this when the number of threads is artificially kept low because the OpenMP runtime may not make forward progress on threads which need data from other threads already in the queue. Anyway, that is just my intuition having looked at dozens of these crash backtraces. I certainly do not know the codebase as well as you guys.
I can rerun the tests with OMP_NUM_THREADS=12 but I won't be able to get to that until next week. The tests themselves are very quick but going through the results and trying to distill them into the minimum number of bug reports is time consuming. As it is, after looking at the many crashes that I have, I started getting the impression that I was seeing different manifestations of the same problem
Yeah, that's different from what we were seeing previously (cf #693). I'll leave this for @mxz297 to investigate. He's doing some presentations at SC, so it'll be a couple of weeks.