binaryninja-api icon indicating copy to clipboard operation
binaryninja-api copied to clipboard

Lacking an effective way to handle structure access if the structure is represented as a sum of two registers

Open xusheng6 opened this issue 1 year ago • 3 comments

Screenshot 2024-08-28 at 4 07 15 PM

In this user shared binary, the function is doing some PE parsing, we can see r15 + arg2 is actually a pointer to the PE header, and at offset 0x88 lies the export directory table. The user wishes to set the type of the expression r15 + arg2 appropriately so that the code can be more readable.

A naive approach would be attempting to offer a way for the user to set the type of an arbitrary expression in the IL. This may solve the problem at first glance, however, due to the way we generate the ILs, there is no guarantee that the index of an IL expression does not change in the future. So there would be no way to reliably serialize the user provided type

Coincidentally, for the next two instructions immediately following, i.e.,

   3 @ 000089d4  void* rdi_1 = arg2 + rax
   4 @ 000089d8  uint64_t rsi_1 = zx.q(*(rdi_1 + 0x18))

We are not bothered by the same problem, since there is an intermediate variable rdi_1, and we can easily set the type of it to obtain better decompilation output. As such, offering a way to create intermediate variables might be a viable solution, but it will involve a large amount of effort to support.

On the other hand, it is possible to develop a workflow to insert one instruction to create the intermediate variable, and re-write the IL and replace all occurrence of r15 + arg2 with the intermediate variable. The problem is it might be too expensive to scan the code to do such replacement

xusheng6 avatar Aug 28 '24 08:08 xusheng6

The issue is first known in https://github.com/Vector35/binaryninja-api/discussions/5629

xusheng6 avatar Aug 28 '24 08:08 xusheng6

The binary is shared in private. V35 folks should search for "answer sad mate solid lunch" to find the binary

xusheng6 avatar Aug 28 '24 08:08 xusheng6

In IDA you can select the 0x88 token, and hit T, then input a structure to force the display as a structure offset. However, it is purely a display thing -- the decompilation is unaware of this and it still provides an output similar to binja's. Therefore, I do not think IDA's approach is ideal for such a situation

Screenshot 2024-08-28 at 4 39 55 PM

xusheng6 avatar Aug 28 '24 08:08 xusheng6