[bug] [question] instruction enhancement caching weirdness
@OBarronCS
I'm confused about this: https://github.com/pwndbg/pwndbg/blob/e83dee14c24f5f4a7bd66d2e2d214f04debb4be8/pwndbg/aglib/disasm/disassembly.py#L55-L57
Does it makes sense? What if someone is in a loop and puts a conditional breakpoint a few instructions forward to let the loop iterate a few times wouldn't the annotation be incorrect?
However, I tried to create a reproducer ~~and didn't succeed :(~~ See comment.
source:
int main (){
int awa = 0;
awa += 20;
awa += 20;
awa += 20;
awa += 20;
awa += 20;
awa += 20;
awa += 20;
awa += 20;
awa += 20;
awa += 20;
for (int i = 0; i < 100; ++i) {
int x = i;
int y = x + 4;
y += 0x44;
x += 0x1337;
}
return 0;
}
Compiled with gcc -g main.c -o main
Expected:
pwndbg> start
pwndbg> b *0x0000555555555158 if i == 50
pwndbg> continue
► 0x555555555158 <main+63> mov dword ptr [rbp - 8], eax [0x7fffffffe188] <= 0x32
0x55555555515b <main+66> mov eax, dword ptr [rbp - 8] EAX, [0x7fffffffe188] => 0x32
0x55555555515e <main+69> add eax, 4 EAX => 54 (0x32 + 0x4)
0x555555555161 <main+72> mov dword ptr [rbp - 4], eax [0x7fffffffe18c] <= 0x36
0x555555555164 <main+75> add dword ptr [rbp - 4], 0x44 [0x7fffffffe18c] <= 122 (0x36 + 0x44)
0x555555555168 <main+79> add dword ptr [rbp - 8], 0x1337 [0x7fffffffe188] <= 0x1369 (0x32 + 0x1337)
0x55555555516f <main+86> add dword ptr [rbp - 0x10], 1 [0x7fffffffe180] <= 51 (0x32 + 0x1)
With cond bp:
pwndbg> start
pwndbg> b *0x0000555555555155
pwndbg> continue
► 0x555555555155 <main+60> mov eax, dword ptr [rbp - 0x10] EAX, [0x7fffffffe180] => 0
0x555555555158 <main+63> mov dword ptr [rbp - 8], eax [0x7fffffffe188] <= 0
0x55555555515b <main+66> mov eax, dword ptr [rbp - 8] EAX, [0x7fffffffe188] => 0
0x55555555515e <main+69> add eax, 4 EAX => 4 (0 + 4)
0x555555555161 <main+72> mov dword ptr [rbp - 4], eax [0x7fffffffe18c] <= 4
0x555555555164 <main+75> add dword ptr [rbp - 4], 0x44 [0x7fffffffe18c] <= 72 (0x4 + 0x44)
0x555555555168 <main+79> add dword ptr [rbp - 8], 0x1337 [0x7fffffffe188] <= 0x1337 (0x0 + 0x1337)
0x55555555516f <main+86> add dword ptr [rbp - 0x10], 1 [0x7fffffffe180] <= 1 (0 + 1)
pwndbg> delete 2
pwndbg> b *0x0000555555555158 if i == 50
pwndbg> continue
0x555555555155 <main+60> mov eax, dword ptr [rbp - 0x10] EAX, [0x7fffffffe180] => 1
► 0x555555555158 <main+63> mov dword ptr [rbp - 8], eax [0x7fffffffe188] <= 0x32
0x55555555515b <main+66> mov eax, dword ptr [rbp - 8] EAX, [0x7fffffffe188] => 0x32
0x55555555515e <main+69> add eax, 4 EAX => 54 (0x32 + 0x4)
0x555555555161 <main+72> mov dword ptr [rbp - 4], eax [0x7fffffffe18c] <= 0x36
0x555555555164 <main+75> add dword ptr [rbp - 4], 0x44 [0x7fffffffe18c] <= 122 (0x36 + 0x44)
0x555555555168 <main+79> add dword ptr [rbp - 8], 0x1337 [0x7fffffffe188] <= 0x1369 (0x32 + 0x1337)
It is correct, but I expected it to be wrong, for instance I expected
0x555555555164 <main+75> add dword ptr [rbp - 4], 0x44
to be wrongly annotated with [0x7fffffffe18c] <= 72 (0x4 + 0x44) because it is cached.
Doing a bit of print debugging I can see the cache is not invalidated, but get_one_instruction() is called twice on 0x555555555164 the first time it has from_cache=True and fetches the stale instruction, the second time it has from_cache=False and I guess it overwrites it with the correct values.
Why is the same address being fetched twice? Is the cache working correctly?
I might be misunderstanding something, but I was reading https://pwndbg.re/pwndbg/dev/contributing/improving-annotations/#caching-annotations and got a bit confused .
Oh the instruction cache is only for instructions behind the PC?
Yeah it seems to be bugged:
pwndbg> start
pwndbg> b *0x0000555555555155
pwndbg> continue
► 0x555555555155 <main+60> mov eax, dword ptr [rbp - 0x10] EAX, [0x7fffffffe180] => 0
0x555555555158 <main+63> mov dword ptr [rbp - 8], eax [0x7fffffffe188] <= 0
0x55555555515b <main+66> mov eax, dword ptr [rbp - 8] EAX, [0x7fffffffe188] => 0
0x55555555515e <main+69> add eax, 4 EAX => 4 (0 + 4)
0x555555555161 <main+72> mov dword ptr [rbp - 4], eax [0x7fffffffe18c] <= 4
0x555555555164 <main+75> add dword ptr [rbp - 4], 0x44 [0x7fffffffe18c] <= 72 (0x4 + 0x44)
0x555555555168 <main+79> add dword ptr [rbp - 8], 0x1337 [0x7fffffffe188] <= 0x1337 (0x0 + 0x1337)
0x55555555516f <main+86> add dword ptr [rbp - 0x10], 1 [0x7fffffffe180] <= 1 (0 + 1)
pwndbg> delete 2
pwndbg> b *0x55555555516f if i == 50
pwndbg> continue
0x55555555515b <main+66> mov eax, dword ptr [rbp - 8] EAX, [0x7fffffffe148] => 0
0x55555555515e <main+69> add eax, 4 EAX => 4 (0 + 4)
0x555555555161 <main+72> mov dword ptr [rbp - 4], eax [0x7fffffffe14c] <= 4
0x555555555164 <main+75> add dword ptr [rbp - 4], 0x44 [0x7fffffffe14c] <= 72 (0x4 + 0x44)
0x555555555168 <main+79> add dword ptr [rbp - 8], 0x1337 [0x7fffffffe148] <= 0x1337 (0x0 + 0x1337)
► 0x55555555516f <main+86> add dword ptr [rbp - 0x10], 1 [0x7fffffffe140] <= 51 (0x32 + 0x1)
0x555555555173 <main+90> cmp dword ptr [rbp - 0x10], 0x63 0x33 - 0x63 EFLAGS => 0x283 [ CF pf af zf SF IF df of ac ]
0x555555555177 <main+94> ✔ jle main+60 <main+60>
↓
0x555555555155 <main+60> mov eax, dword ptr [rbp - 0x10] EAX, [0x7fffffffe140] => 0x33
0x555555555158 <main+63> mov dword ptr [rbp - 8], eax [0x7fffffffe148] <= 0x33
0x55555555515b <main+66> mov eax, dword ptr [rbp - 8] EAX, [0x7fffffffe148] => 0x33
We can see the
0x555555555164 <main+75> add dword ptr [rbp - 4], 0x44 [0x7fffffffe14c] <= 72 (0x4 + 0x44)
line is stale.
Does it makes sense? What if someone is in a loop and puts a conditional breakpoint a few instructions forward to let the loop iterate a few times wouldn't the annotation be incorrect?
Good catch, this behavior only works for unconditional breakpoints. If the breakpoint you are going to is within the few of the context screen, and you continue to it, the instruction in-between will indeed use the "cached" annotations. However, since we used emulation to create these in the first place, and we only step over them once to the breakpoint (they were the only instructions that were executed), then they still should be up to date, since no other mutations should have taken effect to make them stale.
Conditional breakpoints do unfortunately break this, since we don't guarantee we only executed those instructions once. I can look into a way to handle this behavior.
Oh the instruction cache is only for instructions behind the PC?
This is in order to make sure we handle self-modifying code, however given the edge case behavior found in #2979, I've been rethinking this, seeing if there is a better approach. EDIT: on second thought, self-modifying code could equally mutate previous instructions that are in the cache...