angr
angr copied to clipboard
tracer: resync logic needed for conditional instructions
This may be I don't understand ARM.
In the following snippet, the block should end at 0x3DB38
text:0003DB14 CMN R2, #0x10
.text:0003DB18 LDMGE R1!, {R3,R4,R12,LR}
.text:0003DB1C STMGE R0!, {R3,R4,R12,LR}
.text:0003DB20 SUBGE R2, R2, #0x10
.text:0003DB24 POP {R4}
.text:0003DB28
.text:0003DB28 loc_3DB28 ; CODE XREF: memcpy_0+34↑j
.text:0003DB28 ADDS R2, R2, #0x14
.text:0003DB2C
.text:0003DB2C loc_3DB2C ; CODE XREF: memcpy_0+78↓j
.text:0003DB2C LDMGE R1!, {R3,R12,LR}
.text:0003DB30 STMGE R0!, {R3,R12,LR}
.text:0003DB34 SUBSGE R2, R2, #0xC
.text:0003DB38 BGE loc_3DB2C
angr generates the correct basic block. However, when trying to execute the snippet, somehow angr can stop execution prematurely at 0x0003DB1C which confuses me.
Everything to replicate the issue is attached as follow
https://gist.github.com/Kyle-Kyle/92371cad52f772a38e9a0eed9cf0551b
the result of state.step()
from 0x3db14 is:
[<SimState @ 0x43db1c>, <SimState @ 0x43db20>, <SimState @ 0x43db30>, <SimState @ 0x43db34>, <SimState @ 0x43db2c>, <SimState @ 0x43db3c>]
Before looking at what VEX lifts the blocks to: These instructions are all conditional, right? So strictly speaking, it's not incorrect for VEX to break the block at 0x43DB1C.
VEX does not break the block at 0x43DB1C. VEX and QEMU agree on this: the block is from 0x0003DB14 to 0x003DB3C
QEMU only shows [0x0003DB14, 0x003DB3C] in the trace, but angr shows much more than that, something like [0x0003DB14, 0x0003DB1C, ...., 0x003DB3C]. so tracer freaks out.
Yeah I am correct. Take a look at the VEX IR output:
00 | ------ IMark(0x43db14, 4, 0) ------
01 | t0 = GET:I32(r2)
02 | PUT(cc_op) = 0x00000001
03 | PUT(cc_dep1) = t0
04 | PUT(cc_dep2) = 0x00000010
05 | PUT(cc_ndep) = 0x00000000
06 | PUT(pc) = 0x0043db18
07 | ------ IMark(0x43db18, 4, 0) ------
08 | t40 = armg_calculate_condition(0x000000a1,t0,0x00000010,0x00000000):Ity_I32
09 | t42 = 32to1(t40)
10 | t41 = Not1(t42)
11 | if (t41) { PUT(pc) = 0x43db1c; Ijk_Boring }
12 | t4 = GET:I32(r1)
13 | t43 = LDle:I32(t4)
14 | PUT(r3) = t43
15 | t46 = Add32(t4,0x00000004)
16 | t45 = LDle:I32(t46)
17 | PUT(r4) = t45
18 | t48 = Add32(t4,0x00000008)
19 | t47 = LDle:I32(t48)
20 | PUT(r12) = t47
21 | t50 = Add32(t4,0x0000000c)
22 | t49 = LDle:I32(t50)
23 | PUT(lr) = t49
24 | t51 = Add32(t4,0x00000010)
25 | PUT(r1) = t51
26 | PUT(pc) = 0x0043db1c
27 | ------ IMark(0x43db1c, 4, 0) ------
28 | t58 = t41
29 | if (t58) { PUT(pc) = 0x43db20; Ijk_Boring }
30 | t7 = GET:I32(r0)
31 | STle(t7) = t43
32 | t62 = Add32(t7,0x00000004)
33 | STle(t62) = t45
34 | t64 = Add32(t7,0x00000008)
35 | STle(t64) = t47
36 | t66 = Add32(t7,0x0000000c)
37 | STle(t66) = t49
38 | t68 = Add32(t7,0x00000010)
39 | PUT(r0) = t68
40 | ------ IMark(0x43db20, 4, 0) ------
41 | t12 = Sub32(t0,0x00000010)
42 | t77 = CmpNE32(t40,0x00000000)
43 | t75 = ITE(t77,t12,t0)
44 | PUT(pc) = 0x0043db24
... snip ...
Statement 29 introduces a new target, 0x43db20, when the condition of that instruction is met. During tracing, the condition of that instruction is met, and then 0x43db20 becomes the next instruction to execute.
OK. So it sounds like the root cause if the definition of "basic block" is different in VEX because it does not have the semantic information that conditional execution is not a jump. And this is causing troubles in tracing
This means the assumption we have in tracer that angr executes on "basic block"(in the sense of ARM instructions) is wrong
I'm not an expert in pyvex engine. But is it possible to make pyvex specifically not mark conditional execution as the end of a basic block? So that the definitions of "basic block" are consistent in both pyvex and tracer
So we have a feature which is sort of the opposite of what you want - there's the strict_block_end
lifting options, which causes the lifter to cut blocks short after any conditional jump, making the definition of basic block consistent (technically vex doesn't give you basic blocks - IRSB stands for IR super-blocks). Trying to do what you actually want would require rewriting an awful lot of the vex lifter C code to implement conditional instructions as if-then-else stores instead of conditional branches. This might not actually be super impossible - a lot of arm instructions are already implemented this way. If you suspect that it's only this one instruction where the problem is, you can try going in and doing surgery on vex/priv/guest_arm_toIR.c.
ut is it possible to make pyvex specifically not mark conditional execution as the end of a basic block? So that the definitions of "basic block" are consistent in both pyvex and tracer
@Kyle-Kyle In this specific case, are you sure it's not because the Qemu trace did not take that branch (0x43db1c -> 0x43db20), but angr did take it? This might be a discrepancy of the two traces. Changing VEX's behavior in this case will hide the actual problem.
@ltfish I'm sure because continuing executing the same state eventually reaches the address that in QEMU's trace. And the execution trace is basically the basic block
@rhelmot Instead of changing how it is lifted which is hard for all instructions, can we change how it generates successors? Like: if it stops at a "jump" that is actually a conditional execution, continue executing.
if it stops at a "jump" that is actually a conditional execution, continue executing.
I hope we can solve it in an architecture-agnostic manner, instead of explicitly making "conditional execution" something requires a special treatment in how angr lifts things.
Can we introduce a "re-sync" logic in such cases in tracer? I feel there is already code in place that does this.