binaryninja-api
binaryninja-api copied to clipboard
Incorrect tokenization of HLIL_STRUCT_FIELD in PseudoC and PseudoRust
Version and Platform (required):
- Binary Ninja Version: 5.1.7598-test Ultimate (0e5168d7)
- Edition: Ultimate
- OS: Debian Linux
- OS Version: 13
- CPU Architecture: x86_64
Bug Description:
Lifter creates a dereference of a floating point value that was not in the disassembly. For some reason, the vcvt.s32.f32->vmov->sxth chain generates a pointer dereference.
Pseudo C (note the dereference; unfortunately not readable at all):
Pseudo Rust (note the deference):
Steps To Reproduce:
- Open attached BNDB
Expected Behavior: Binary Ninja to not make up dereferences of floating point values.
Binary: bug_vfp.zip
This is purely a Language Representation issue. The underlying instruction doesn't have a dereference in it but for some reason "HLIL_STRUCT_FIELD" is adding a * token. I've tracked this down to these particular lines of code:
Pseudo-C: https://github.com/Vector35/binaryninja-api/blob/5320c6967cd380fd73be079675a960f22ba53b40/lang/c/pseudoc.cpp#L1487 Pseudo-Rust: https://github.com/Vector35/binaryninja-api/blob/5320c6967cd380fd73be079675a960f22ba53b40/lang/rust/pseudorust.cpp#L1615
I don't know why this is being added. I really don't think it should be.
Possibly related, certainly also connected to HLIL_STRUCT_FIELD:
HLIL/Pseudo-C:
Pseudo-Rust:
In this case, it's the way the indexing is represented in a partial register write (more or less the same as a vector register write), although in my case turning on Show All Expression Types reveals that it is actually wrong (there is no cast to (char*) before the register):
Repro:
- Create a new binary with architecture MIPS r5900l
- Enter this into the console:
bv.write(here, struct.pack('<I', (0b011100<<26 | 0b10000<<21 | 0b00010<<16 | 0b00000<<11 | 0b01101<<6 | 0b101001<<0))) - Compare HLIL to Pseudo-C or Pseudo-Rust.
Given the HLIL that the Pseudo-C layer is working with and the approach it takes to displaying it, the rendering issue isn't quite as simple as an additional *.
The HLIL looks like so:
08028044 (HLIL_ASSIGN.d (HLIL_DEREF.d *0x24000274) =
08028044 (HLIL_SX.d sx.d((HLIL_STRUCT_FIELD.w (HLIL_VAR.q q0):4.w))))
The HLIL_STRUCT_FIELD ends up being rendered as a *(uint16_t*)((char*)q0)[4] which is bogus:
08028044 *(uint32_t*)0x24000274 = (int32_t)*(uint16_t*)((char*)q0)[4];
q0 is an int64_t, not a pointer, so it should be &q0. Then some additional restructuring is needed to make sure the types line up. It should end up with either *(int16_t*)&((char*)&q0)[4] or *(int16_t*)((char*)&q0 + 4). (int16_t)((int32_t*)&q0)[1] would be an option, too.[^1]
In this case since q0 is an integer this could be expressed more straightforwardly as a shift. Then again, ideally the lifting would preserve the independent use of s0 / s1 and the resulting HLIL wouldn't be using offsets from q0 at all.
[^1]: The existing Pseudo-C rendering shows uint16_t for the 16 bit type but that seems incorrect given the sign extension involved here?
Fixed in 5.2.8557
Thank you!