binexport
binexport copied to clipboard
Ghidra plugin sometimes generates broken BinExport files
Thanks to the release of the BinDiff source code, I was finally able to solve an issue I was having when diffing bare-metal binaries (this specific instance happened with an ARM binary, but I've seen this happen with 8051 binaries as well).
Problem
The issue I was troubleshooting was causing the bindiff
binary to exit with the following error:
...
I1003 22:48:33.948155 675035 call_graph.cc:196] DetachFlowGraph: coudn't find call graph node for flow graph 00000000
Error: AttachFlowGraph: couldn't find call graph node for flow graph 00000000
What's happening here is:
- If the
entry_basic_block_index
for aflow_graph
is not set, it defaults to zero. - In
FlowGraph::Read
, if theentry_basic_block_index
of theflow_graph
is zero,entry_point_address_
will be set to the address of the 0thbasic_block
(really of the 0thinstruction
of that 0thbasic_block
). - In
CallGraph::AttachFlowGraph
, if there is novertex
whose address is equal to the address of that 0thbasic_block
,CallGraph::GetVertex
will fail to find thevertex
and an exception will be thrown.
I think the reason entry_basic_block_index
wasn't getting set was because the function the flow_graph
came from was incompletely disassembled, which in this specific instance happened because Ghidra's disassembler had incorrectly tried disassembling Thumb-2 instructions as full-size ARM instructions. I used the following Python script to identify the starting addresses of the problem flow_graph
messages:
#!/usr/bin/env python3
import sys
import binexport2_pb2
be = binexport2_pb2.BinExport2()
be.ParseFromString(open(sys.argv[1], 'rb').read())
for flow_graph in be.flow_graph:
if flow_graph.entry_basic_block_index == 0:
instr = be.instruction[be.basic_block[flow_graph.basic_block_index[0]].instruction_index[0].begin_index]
print(hex(instr.address))
Some of the addresses listed were zero, which was strange because I knew the code at that address was already correctly disassembled. Regardless, I fixed up the disassembly at the non-zero addresses and ran bindiff
again. Unfortunately, I got the same error message. After some more debugging, I discovered the problem--the instructions whose addresses were set to zero were causing the following issue:
- In
FlowGraph::Read
, if the 0thinstruction
of thebasic_block
doesn't have an address set,entry_point_address_
will be set to zero. - Similar to the initial issue I was investigating, if there's no
vertex
with an address matching theentry_point_address_
, an exception will be thrown.
Fixing this issue required identifying those instructions that had no address set, searching for the mnemonic/bytes in the disassembly listing, and then fixing the disassembly (once again, Ghidra tried disassembling Thumb-2 instructions as ARM instructions). I used the following Python script to identify the problem instructions:
#!/usr/bin/env python3
import sys
import binexport2_pb2
be = binexport2_pb2.BinExport2()
be.ParseFromString(open(sys.argv[1], 'rb').read())
for flow_graph in be.flow_graph:
if flow_graph.entry_basic_block_index == 0:
instr = be.instruction[be.basic_block[flow_graph.basic_block_index[0]].instruction_index[0].begin_index]
if instr.address == 0:
print(be.mnemonic[instr.mnemonic_index].name, instr.raw_bytes.hex())
With all the improperly-disassembled instructions now properly-disassembled, the BinDiff exporter for Ghidra was able to generate a file that BinDiff was able to use. Yay!
Solution
Some suggestions for solutions to the described problems (better solutions may be available):
Near-term
- The check here should be inverted, and simply decline to add the
flow_graph
if it can't set theentry_basic_block_index
. - Something needs to be fixed somewhere in this function to ensure the
begin_index
is always set to the correct address.
Long-term
I think the root cause of all these problems is that certain fields are optional when they really shouldn't be, especially since (please correct me if I'm wrong) it isn't possible for a protobuf-consuming application to tell the difference between an optional field whose value is zero and an optional field that is missing. Because BinDiff can't tell if a field has been set or is missing, it can't detect the problem when it happens and either handle it properly or bail early--it can only assume the value is correct and hope for the best. So, either the optional fields need to be made mandatory and always included in files generated by BinExport, or BinDiff needs to be modified to properly handle situations where those fields are missing.