binaryninja-api icon indicating copy to clipboard operation
binaryninja-api copied to clipboard

Incorrect tokenization of HLIL_STRUCT_FIELD in PseudoC and PseudoRust

Open whitequark opened this issue 5 months ago • 1 comments

Version and Platform (required):

  • Binary Ninja Version: 5.1.7598-test Ultimate (0e5168d7)
  • Edition: Ultimate
  • OS: Debian Linux
  • OS Version: 13
  • CPU Architecture: x86_64

Bug Description: Lifter creates a dereference of a floating point value that was not in the disassembly. For some reason, the vcvt.s32.f32->vmov->sxth chain generates a pointer dereference.

Image

Image

Image

Pseudo C (note the dereference; unfortunately not readable at all):

Image

Pseudo Rust (note the deference):

Image

Steps To Reproduce:

  1. Open attached BNDB

Expected Behavior: Binary Ninja to not make up dereferences of floating point values.

Binary: bug_vfp.zip

whitequark avatar Jun 11 '25 23:06 whitequark

This is purely a Language Representation issue. The underlying instruction doesn't have a dereference in it but for some reason "HLIL_STRUCT_FIELD" is adding a * token. I've tracked this down to these particular lines of code:

Pseudo-C: https://github.com/Vector35/binaryninja-api/blob/5320c6967cd380fd73be079675a960f22ba53b40/lang/c/pseudoc.cpp#L1487 Pseudo-Rust: https://github.com/Vector35/binaryninja-api/blob/5320c6967cd380fd73be079675a960f22ba53b40/lang/rust/pseudorust.cpp#L1615

I don't know why this is being added. I really don't think it should be.

plafosse avatar Jun 17 '25 13:06 plafosse

Possibly related, certainly also connected to HLIL_STRUCT_FIELD:

HLIL/Pseudo-C: Image

Pseudo-Rust: Image

In this case, it's the way the indexing is represented in a partial register write (more or less the same as a vector register write), although in my case turning on Show All Expression Types reveals that it is actually wrong (there is no cast to (char*) before the register): Image

Repro:

  1. Create a new binary with architecture MIPS r5900l
  2. Enter this into the console: bv.write(here, struct.pack('<I', (0b011100<<26 | 0b10000<<21 | 0b00010<<16 | 0b00000<<11 | 0b01101<<6 | 0b101001<<0)))
  3. Compare HLIL to Pseudo-C or Pseudo-Rust.

galenbwill avatar Jul 04 '25 22:07 galenbwill

Given the HLIL that the Pseudo-C layer is working with and the approach it takes to displaying it, the rendering issue isn't quite as simple as an additional *.

The HLIL looks like so:

08028044            (HLIL_ASSIGN.d (HLIL_DEREF.d *0x24000274) =
08028044                (HLIL_SX.d sx.d((HLIL_STRUCT_FIELD.w (HLIL_VAR.q q0):4.w))))

The HLIL_STRUCT_FIELD ends up being rendered as a *(uint16_t*)((char*)q0)[4] which is bogus:

08028044        *(uint32_t*)0x24000274 = (int32_t)*(uint16_t*)((char*)q0)[4];

q0 is an int64_t, not a pointer, so it should be &q0. Then some additional restructuring is needed to make sure the types line up. It should end up with either *(int16_t*)&((char*)&q0)[4] or *(int16_t*)((char*)&q0 + 4). (int16_t)((int32_t*)&q0)[1] would be an option, too.[^1]

In this case since q0 is an integer this could be expressed more straightforwardly as a shift. Then again, ideally the lifting would preserve the independent use of s0 / s1 and the resulting HLIL wouldn't be using offsets from q0 at all.

[^1]: The existing Pseudo-C rendering shows uint16_t for the 16 bit type but that seems incorrect given the sign extension involved here?

bdash avatar Jul 16 '25 21:07 bdash

Fixed in 5.2.8557

D0ntPanic avatar Nov 04 '25 19:11 D0ntPanic

Thank you!

whitequark avatar Nov 04 '25 19:11 whitequark