dyninst icon indicating copy to clipboard operation
dyninst copied to clipboard

dyninst is unable to parse libdw.so

Open stanfordcox opened this issue 3 years ago • 14 comments

The systemtap dyninst backend creates a shared object containing probe functions. The shared object cannot currently be loaded and I chased the problem down to dyninst being unable to parse libdw.so which the stap shared object references. Using the CodeDump tool located here: https://github.com/mxz297/dyninst-tools I see: ../CodeDump/CodeDump /usr/lib64/libdw-0.185.so Segmentation fault (core dumped)

If I switch to the Fedora version of libdw it works. One possibility is the rhel version is built with link time optimization. If I use the Fedora version then the systemtap dyninst backend works okay. The libdw in question is located at: https://scox.fedorapeople.org/dyninst/libdw-0.185.x86_64.so

On ppcle CodeDump works okay with libdw-0.185.ppc64le.so but fails the library load: ./mutator-ll -dynamic ./mutatee /usr/lib64/libdw-0.185.so mutator-ll: /builddir/build/BUILD/dyninst-11.0.0/dyninst-11.0.0/dataflowAPI/src/RoseInsnFactory.C:324: virtual bool Dyninst::DataflowAPI::RoseInsnPPCFactory::handleSpecialCases(entryID, SgAsmInstruction*, SgAsmOperandList*): Assertion `power_op_bclr == iapi_opcode' failed. Aborted (core dumped)

(mutator-ll.cpp is also on the fedorapeople site)

Also note that if I use a system command, e.g. echo which is also on the fedorapeople site, then the bpatch.processCreate fails. DYNINSTAPI_RT_LIB=/usr/lib64/dyninst/libdyninstAPI_RT.so LD_LIBRARY_PATH=/usr/lib64/dyninst ./mutator-ll -dynamic /usr/bin/echo /usr/lib64/libdw-0.185.so --FATAL-- #68: Dyninst was unable to create the specified process --FATAL-- #68: create process failed bootstrap Failed to create /usr/bin/echo

stanfordcox avatar Jun 23 '21 14:06 stanfordcox

Above is with dyninst-11.0.0. Files on https://scox.fedorapeople.org/dyninst/ echo.ppc64le (fails processCreate) libdw-0.185.f33.x86_64.so (fails CodeDump) libdw-0.185.ppc64le.so (passes CodeDump, fails loadLibrary) libdw-0.185.x86_64.so (passes CodeDump, passes loadLibrary) mutator-ll.cpp (does a loadLibrary of a given so)

stanfordcox avatar Jun 23 '21 14:06 stanfordcox

@stanfordcox I downloaded libdw-0.185.f33.x86_64.so and ran it with CodeDump compiled with latest Dyninst head. I do not see segfault. I saw that you are using Dyninst-11.0.0. I made several fixes to code parsing since 11.0.0 and these fixes are available in 11.0.1, which was just released recently. Can you retry with 11.0.1 and report which problems remain?

mxz297 avatar Jun 23 '21 15:06 mxz297

@mxz297 Right you are, upstream does indeed work. Should have checked that first. For a rhel release I am limited to only patching, not switching out the version. Think that will be a problem for me to pull out a patch for that? I'll build an upstream ppc64le and try it there.

stanfordcox avatar Jun 23 '21 16:06 stanfordcox

@stanfordcox Patching will work. There is only one commit past v11.0.1, but it just adds a missing instruction (xsave).

dyninst-11.0.1.patch.log

hainest avatar Jun 23 '21 16:06 hainest

@hainest Thanks for the patch. That fixes the x86_64 problem. I still see the ppc64le problem: % LD_LIBRARY_PATH=/home/scox/dyninst/install/lib DYNINSTAPI_RT_LIB=/home/scox/dyninst/install/lib/libdyninstAPI_RT.so ./mutator-ll -dynamic /usr/bin/echo /usr/lib64/libdw-0.185.so --FATAL-- #68: Dyninst was unable to create the specified process --FATAL-- #68: create process failed bootstrap Failed to create /usr/bin/echo Segmentation fault (core dumped) % LD_LIBRARY_PATH=/home/scox/dyninst/install/lib DYNINSTAPI_RT_LIB=/home/scox/dyninst/install/lib/libdyninstAPI_RT.so ./mutator-ll -dynamic ./mutatee /usr/lib64/libdw-0.185.so mutator-ll: /home/scox/dyninst/src/dataflowAPI/src/RoseInsnFactory.C:324: virtual bool Dyninst::DataflowAPI::RoseInsnPPCFactory::handleSpecialCases(entryID, SgAsmInstruction*, SgAsmOperandList*): Assertion `power_op_bclr == iapi_opcode' failed. Aborted (core dumped)

stanfordcox avatar Jun 23 '21 17:06 stanfordcox

The problem is iapi_opcode is power_op_bc. I have no idea if this is right but it does get dyninst to handle libdw-0.185.so on ppc. power_op_bc is not handled; perhaps it was meant to be handled by branch_target? diff ~/dyninst/src/dataflowAPI/src/RoseInsnFactory.C~ ~/dyninst/src/dataflowAPI/src/RoseInsnFactory.C --- /home/scox/dyninst/src/dataflowAPI/src/RoseInsnFactory.C~ 2021-06-09 15:54:21.753883619 -0400 +++ /home/scox/dyninst/src/dataflowAPI/src/RoseInsnFactory.C 2021-06-23 14:17:37.854933719 -0400 @@ -319 +319 @@

  • if(branch_target) {
  • if(branch_target /* NEW */ || iapi_opcode == power_op_bc) {

Not sure what the issue is with loading /usr/bin/echo (or most any other /usr/bin/*) into a process

stanfordcox avatar Jun 23 '21 18:06 stanfordcox

@mxz297 I'm going to have to hand this off to you. I'm afraid I've run out of knowledge.

hainest avatar Jun 23 '21 19:06 hainest

@mxz297 @hainest BTW I narrowed down the 11.0.1 changes to find the culprit and it was this one: Skip parsing of blocks whose code buffer is null (#1033) That build of libdw must have a have an odd code block

stanfordcox avatar Jun 24 '21 14:06 stanfordcox

@hainest @stanfordcox

On ppcle CodeDump works okay with libdw-0.185.ppc64le.so but fails the library load: ./mutator-ll -dynamic ./mutatee /usr/lib64/libdw-0.185.so mutator-ll: /builddir/build/BUILD/dyninst-11.0.0/dyninst-11.0.0/dataflowAPI/src/RoseInsnFactory.C:324: virtual bool Dyninst::DataflowAPI::RoseInsnPPCFactory::handleSpecialCases(entryID, SgAsmInstruction*, SgAsmOperandList*): Assertion `power_op_bclr == iapi_opcode' failed. Aborted (core dumped)

I cannot reproduce this error on a local power machine due to missing dependencies for libdw-0.185.so. It looks like loadlLibrary need to loads all its dependency.

@stanfordcox Can you post the stack trace for the assertion?

Also note that if I use a system command, e.g. echo which is also on the fedorapeople site, then the bpatch.processCreate fails. DYNINSTAPI_RT_LIB=/usr/lib64/dyninst/libdyninstAPI_RT.so LD_LIBRARY_PATH=/usr/lib64/dyninst ./mutator-ll -dynamic /usr/bin/echo /usr/lib64/libdw-0.185.so --FATAL-- #68: Dyninst was unable to create the specified process --FATAL-- #68: create process failed bootstrap Failed to create /usr/bin/echo

For this case, by setting environment variable DYNINST_DEBUG_STARTUP, we can see that dyninst cannot find main function in /usr/bin/echo. Binaries under /usr/bin are likely stripped, so there are no main symbols. I know that Dyninst has some heuristics for finding main for stripped binaries:

https://github.com/dyninst/dyninst/blob/2e78f7c5c26c8ba5c9f9388d3ba447450528dcbf/dyninstAPI/src/image.C#L480

My guess is that the heuristics failed to identify the main function for stripped binaries for latest fedora.

mxz297 avatar Jun 24 '21 14:06 mxz297

@mxz297 Ah you're right about echo; thought I had grabbed the debuginfo for that. It works okay with it of course.

The stapdyn testsuite seems to run okay with the RoseInsnFactory "patch" so that must at least be in the right neighborhood. Without the patch the stack trace is: #0 __GI_raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:49 #1 0x00007ffff7659444 in __GI_abort () at abort.c:79 #2 0x00007ffff767320c in __assert_fail_base (fmt=, assertion=assertion@entry=0x7ffff72acd38 "power_op_bclr == iapi_opcode", file=file@entry=0x7ffff72acce0 "/builddir/build/BUILD/dyninst-11.0.0/dyninst-11.0.0/dataflowAPI/src/RoseInsnFactory.C", line=line@entry=324, function=function@entry=0x7ffff72acc60 "virtual bool Dyninst::DataflowAPI::RoseInsnPPCFactory::handleSpecialCases(entryID, SgAsmInstruction*, SgAsmOperandList*)") at assert.c:92 #3 0x00007ffff76732b4 in __GI___assert_fail (assertion=0x7ffff72acd38 "power_op_bclr == iapi_opcode", file=0x7ffff72acce0 "/builddir/build/BUILD/dyninst-11.0.0/dyninst-11.0.0/dataflowAPI/src/RoseInsnFactory.C", line=, function=0x7ffff72acc60 "virtual bool Dyninst::DataflowAPI::RoseInsnPPCFactory::handleSpecialCases(entryID, SgAsmInstruction*, SgAsmOperandList*)") at assert.c:101 #4 0x00007ffff70b928c in Dyninst::DataflowAPI::RoseInsnPPCFactory::handleSpecialCases (this=, iapi_opcode=, insn=, rose_operands=0x7fffa00a66c0) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/dataflowAPI/src/RoseInsnFactory.C:324 #5 0x00007ffff70bc04c in Dyninst::DataflowAPI::RoseInsnFactory::convert (this=0x7fffefffa778, insn=..., addr=31492) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/dataflowAPI/src/RoseInsnFactory.C:81 #6 0x00007ffff71af464 in Dyninst::DataflowAPI::SymEval::expandInsn (insn=..., addr=31492, res=std::map with 1 element = {...}) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/dataflowAPI/src/SymEval.C:485 #7 0x00007ffff71b1104 in Dyninst::DataflowAPI::SymEval::expand (res=std::map with 1 element = {...}, failedInsns=std::set with 0 elements, applyVisitors=) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/dataflowAPI/src/../h/Absloc.h:293 #8 0x00007ffff71b1978 in Dyninst::DataflowAPI::SymEval::expand (assignment=..., applyVisitors=) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/dataflowAPI/src/SymEval.C:75 #9 0x00007ffff70795ac in SymbolicExpression::ExpandAssignment (this=0x7fffefffc640, assign=..., keepMultiOne=) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/parseAPI/src/SymbolicExpression.C:379 #10 0x00007ffff706d30c in JumpTableIndexPred::addNodeCallback (this=0x7fffefffc7d8, ap=..., visitedEdges=std::set with 1 element = {...}) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/parseAPI/src/JumpTableIndexPred.C:187 #11 0x00007ffff70c16d4 in Dyninst::Slicer::updateAndLink (this=this@entry=0x7fffefffccc8, g=..., dir=dir@entry=Dyninst::Slicer::backward, cand=..., cache=..., p=...) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/dataflowAPI/src/slicing.C:394 #12 0x00007ffff70c6b34 in Dyninst::Slicer::sliceInternalAux (this=this@entry=0x7fffefffccc8, g=..., dir=dir@entry=Dyninst::Slicer::backward, p=..., cand=..., skip=skip@entry=false, visited=std::map with 1 element = {...}, singleCache=std::unordered_map with 2 elements = {...}, cache=std::unordered_map with 0 elements) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/dataflowAPI/src/slicing.C:230 #13 0x00007ffff70c6fcc in Dyninst::Slicer::sliceInternalAux (this=this@entry=0x7fffefffccc8, g=..., dir=dir@entry=Dyninst::Slicer::backward, p=..., cand=..., skip=skip@entry=true, visited=std::map with 1 element = {...}, singleCache=std::unordered_map with 2 elements = {...}, cache=...) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/dataflowAPI/src/slicing.C:304 #14 0x00007ffff70caa78 in Dyninst::Slicer::sliceInternal (this=this@entry=0x7fffefffccc8, dir=dir@entry=Dyninst::Slicer::backward, p=...) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/dataflowAPI/src/slicing.C:194 #15 0x00007ffff70cb4b8 in Dyninst::Slicer::backwardSlice (this=0x7fffefffccc8, predicates=...) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/dataflowAPI/src/slicing.C:1440 #16 0x00007ffff7268d70 in IndirectControlFlowAnalyzer::NewJumpTableAnalysis(std::vector<std::pair<unsigned long, Dyninst::ParseAPI::EdgeTypeEnum>, std::allocator<std::pair<unsigned long, Dyninst::ParseAPI::EdgeTypeEnum> > >&) [clone .constprop.0] (this=0x7fffefffd2f8, outEdges=...) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/parseAPI/src/IndirectAnalyzer.C:155 #17 0x00007ffff704d4d0 in Dyninst::InsnAdapter::IA_IAPI::parseJumpTable (this=, currFunc=, currBlk=, outEdges=...) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/parseAPI/src/IA_IAPI.C:918 #18 0x00007ffff704cc9c in Dyninst::InsnAdapter::IA_IAPI::getNewEdges (this=0x7fffb40c1c10, outEdges=std::vector of length 0, capacity 0, context=0x7fffb40ea1f0, currBlk=0x7fffb40f13c8, num_insns=, plt_entries=0x1c570538, knownTargets=std::set with 59 elements = {...}) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/parseAPI/src/IA_IAPI.C:716 #19 0x00007ffff7265968 in Dyninst::ParseAPI::Parser::ProcessCFInsn(Dyninst::ParseAPI::ParseFrame&, Dyninst::ParseAPI::Block*, Dyninst::InsnAdapter::IA_IAPI*) [clone .constprop.0] (this=0x1c5701e0, frame=..., ah=0x7fffb40c1c10, cur=0x7fffb40f13c8) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/parseAPI/src/ParserDetails.C:415 #20 0x00007ffff700d814 in Dyninst::ParseAPI::Parser::parse_frame_one_iteration (this=0x1c5701e0, frame=..., recursive=) at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/parseAPI/src/ParserDetails.h:251 #21 0x00007ffff702a218 in Dyninst::ParseAPI::Parser::LaunchWork(LockFreeQueueItemDyninst::ParseAPI::ParseFrame**, bool) [clone ._omp_fn.0] [clone .lto_priv.0] () at /usr/src/debug/dyninst-11.0.0-1.el9.ppc64le/dyninst-11.0.0/parseAPI/src/Parser.C:1373 #22 0x00007ffff685af08 in gomp_barrier_handle_tasks () from /lib64/libgomp.so.1 #23 0x00007ffff68654f0 in gomp_team_barrier_wait_end () from /lib64/libgomp.so.1 #24 0x00007ffff6861850 in gomp_thread_start () from /lib64/libgomp.so.1 #25 0x00007ffff6d99bc0 in start_thread (arg=0x7fffefffede0) at pthread_create.c:481 #26 0x00007ffff7777eb0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:103

stanfordcox avatar Jun 24 '21 17:06 stanfordcox

@mxz297 @stanfordcox Have we confirmed that this is fixed in the HEAD of master?

hainest avatar Jun 30 '21 16:06 hainest

@hainest I believe the x86 problem is resolved in the HEAD of master. The two power problems are not, and they are two different problems.

One problem is an assertion where @stanfordcox has a patch to work around it. Based on the stack trace without the trace, it happens in jump table analysis during code parsing, jump table analysis itself has some mechanisms to tolerate certain errors, so I am fine with working around it at the moment. @stanfordcox Can you create a PR with your patch? We can review, test the patch and merge it.

The other power problem is caused by a failure in findMain to find main inside striped binaries. @stanfordcox is able to work around this problem by supplying debug file for the binary (essentially, this is still a problem). @hainest I think the strategy is to report this failure in findMain in a separate issue and move on.

mxz297 avatar Jul 01 '21 14:07 mxz297

I will create two prs for the ppcle cases and we can close this one.

stanfordcox avatar Jul 01 '21 20:07 stanfordcox

@stanfordcox @mxz297 I see we have an issue for the findMain problem (#1071), but I don't see a PR for the first item. Did that ever come to fruition?

hainest avatar Feb 05 '22 00:02 hainest