binaryninja-api
binaryninja-api copied to clipboard
Duplicated xref when multiple disassembly instrutions are folded into one HLIL instruction
Version and Platform (required):
- Binary Ninja Version: 3.5.4402-dev (c2f291a7) and also 3.4.4271
- OS: MacOS, Linux
Bug Description: In a RISC-y architecture like MIPS, just about every function call generates two IL xrefs to the function, one where the address is loaded and one at the call instruction. Double-clicking the first one goes somewhere unhelpful, because that load doesn't exist / isn't mapped correctly at the HLIL level. The second xref goes to the right place, because it's the actual call.
Sample binary: https://segbrk.com/curl-mips32
If you look at the source of the xref in disassembly, you will know why it is happening:
This is indeed a bit confusing, I will see if there is a way to improve it
Via the API, things seem to be OK. The xref from the li $t9, 0x402a60 doesn't exist above LLIL:
>>> print('\n'.join([f'0x{xref.address:X}: {xref.llil}' for xref in bv.get_code_refs(0x402a60)]))
0x412174: $t9 = 0x402a60
0x41217C: call($t9)
>>> print('\n'.join([f'0x{xref.address:X}: {xref.mlil}' for xref in bv.get_code_refs(0x402a60)]))
0x412174: None
0x41217C: $v0 = 0x402a60()
>>> print('\n'.join([f'0x{xref.address:X}: {xref.hlil}' for xref in bv.get_code_refs(0x402a60)]))
0x412174: None
0x41217C: sub_402a60()
So is this just a UI display issue? If xrefs were displayed only at the chosen level, then being at HLIL would show only the sub_402a60() xref.
@xusheng6 Thoughts?
Via the API, things seem to be OK. The xref from the
li $t9, 0x402a60doesn't exist above LLIL:>>> print('\n'.join([f'0x{xref.address:X}: {xref.llil}' for xref in bv.get_code_refs(0x402a60)])) 0x412174: $t9 = 0x402a60 0x41217C: call($t9) >>> print('\n'.join([f'0x{xref.address:X}: {xref.mlil}' for xref in bv.get_code_refs(0x402a60)])) 0x412174: None 0x41217C: $v0 = 0x402a60() >>> print('\n'.join([f'0x{xref.address:X}: {xref.hlil}' for xref in bv.get_code_refs(0x402a60)])) 0x412174: None 0x41217C: sub_402a60()So is this just a UI display issue? If xrefs were displayed only at the chosen level, then being at HLIL would show only the
sub_402a60()xref.@xusheng6 Thoughts?
There are two aspects of the issue:
- There are two xrefs to 0x402a60. This is because the xrefs are collected at disassembly or LLIL (which I cannot remember), so the code at 0x412174 and 0x41217C both contribute one xref. And since their addresses are different, they are considered different xrefs. Which is correct.
- The HLIL rendering of the xref entries are added pretty recently. Previously, we always render the xrefs at the disassembly level. At that time, the two xrefs will be rendered something like this:
0x412174: li $t9, 0x402a60
0041217c: jalr $t9
Which is also correct. However, when we added the HLIL xref rendering, we did not consider the case of multiple xrefs from the lower level IL becoming the same instruction in HLIL. I think this is the problem that we need to fix.
The HLIL xref rendering is not done by me, but I am happy to have a loo at it.
Although duplicate xrefs will show, they now navigate somewhere sensible when clicked as of a week ago.
For a variety of reasons, eliminating the duplicates from the UI when they are folded into other instructions in the active IL view would require full IL analysis objects to be kept around for all functions that contributed xrefs in order to update the Qt model correctly which is not an excellent use of resources.