rizin disassembler confused by function which re-uses frame pointer (EBP) [verified for x86-32-bit PE/Windows executable]
Work environment
cc @XVilka from IRC
I'm using https://github.com/rizinorg/cutter/releases/tag/v2.3.4 (AppImage) to work
| Questions | Answers |
|---|---|
| OS/arch/bits (mandatory) | Debian Trixie x86-64 |
| File format of the file you reverse (mandatory) | PE (Windows) |
| Architecture/bits of the file (mandatory) | x86/32 |
rizin -v full output, not truncated (mandatory) |
https://github.com/rizinorg/cutter/releases/tag/v2.3.4 (AppImage) |
Notes:
- I'm a beginner at using Rizin, I don't exclude that I made a mistake. I'm using Cutter because it's more intuitive than the Rizin text interface, and I only have limited vacation time to try and RE anything at all.
- The application in question is a Windows binary from around the year 2000, likely compiled by MSVC. It's meant to extract archive files (called SDF files) for a specific game (Ground Control).
Expected behavior
I expected Rizin/Cutter to resolve references to the stack and non-stack correctly, and not confuse the two, or assign multiple variables for the same address.
Actual behavior
The function that starts at 0x004014a0 re-uses the frame pointer (mov ebp, ecx). We can see by the fact that it uses ecx directly, that this is very likely a MSVC thiscall style function.
Important: I used the edit function option to change the calling convention to cdecl-thiscall-ms.
$ objdump -M intel -d SDFextract.exe --start-address=0x004014a0 --stop-address=0x004016d6 | head -20
SDFextract.exe: file format pei-i386
004014a0 <.text+0x4a0>:
4014a0: 83 ec 18 sub esp,0x18
4014a3: 53 push ebx
4014a4: 55 push ebp
4014a5: 56 push esi
4014a6: 57 push edi
4014a7: 8b e9 mov ebp,ecx ; Here!
4014a9: e8 72 fe ff ff call 0x401320
4014ae: 8b 44 24 2c mov eax,DWORD PTR [esp+0x2c]
4014b2: 33 db xor ebx,ebx
4014b4: 53 push ebx
4014b5: 53 push ebx
4014b6: 6a 03 push 0x3
4014b8: 53 push ebx
...
Rizin is confused by this, and thinks references to EBP are references to the stack. One example from this function:
undefined4::OpenFile(LPCSTR lpFileName, LPVOID content, int32_t arg_ch, int32_t arg_10h, int32_t arg_14h, int32_t arg_15h, int32_t arg_18h);
; ...
; var uint32_t var_20h @ stack - 0x20
; arg LPCSTR lpFileName @ stack + 0x4
; arg LPVOID content @ stack + 0x8
; arg int32_t arg_ch @ stack + 0xc
; arg int32_t arg_10h @ stack + 0x10
; arg int32_t arg_14h @ stack + 0x14
; ...
0x00401616 call dword [CreateFileMappingA] ; 0x411010 ; calls a subroutine, push eip into the stack (esp)
0x0040161c mov esi, eax
0x0040161e cmp esi, ebx
0x00401620 jne 0x401628
0x00401622 mov byte [arg_14h], 1 ; HERE
0x00401626 jmp 0x401657
I was confused a lot by this, until I saw the real objdump:
$ objdump -M intel -d SDFextract.exe --start-address=0x004014a0 --stop-address=0x004016d6
...
401616: ff 15 10 10 41 00 call DWORD PTR ds:0x411010
40161c: 8b f0 mov esi,eax
40161e: 3b f3 cmp esi,ebx
401620: 75 06 jne 0x401628
401622: c6 45 14 01 mov BYTE PTR [ebp+0x14],0x1 ; HERE
401626: eb 2f jmp 0x401657
This is just the latest one I encountered in this function before throwing my hands up and filing an issue. There's more stuff.
Steps to reproduce the behavior
- Binary: SDFextract.exe.zip
- Project: SDFextract.exe.rzdb.zip (this contains my annotations, the function I'm referenced I called
undefined4::OpenFile.
I think I used the normal analysis (aaa) and pointer depth = 5. Then just navigate to the function. I've been reversing this thing for a week now so I don't exactly remember. Perhaps the rzdb contains this information?
It's not just EBP references either, here's a pure ESP example for the same function. Sometimes rizin gets it right, like here.
Dissasembly (good):
0x00401a5b lea ecx, [upperFileName] ; load effective address
0x00401a62 push ecx ; push word, doubleword or quadword onto the stack; LPCSTR lpFileName
0x00401a63 call win32_private_toupper ; win32_private_toupper ; calls a subroutine, push eip into the stack (esp) ; void win32_private_toupper(LPCSTR lpFileName)
0x00401a68 mov cl, byte [upperFileName] ; moves data from src to dst
0x00401a6f add esp, 4 ; adds src and dst, stores result on dst
Raw (objdump):
401a5b: 8d 8c 24 b4 00 00 00 lea ecx,[esp+0xb4] ; &upperFileName[0]
401a62: 51 push ecx
401a63: e8 d7 6d 00 00 call 0x40883f
401a68: 8a 8c 24 b8 00 00 00 mov cl,BYTE PTR [esp+0xb8] ; &upperFileName[0] again (stack isn't restored).
401a6f: 83 c4 04 add esp,0x4
In the case above, Rizin correctly detected that win32_private_toupper has not restored the stack, so it's offset by +0x4 versus the previous reference.
But sometimes it's wrong. Here's an example close to the start of some function. Rizin dissassembly, from the start until the instruction with the "wrong" variable, where I'd expect to see upperfileName but instead see arg_a8h:
undefined4::MoreReadFile(LPCSTR lpFileName, int32_t arg_8h, int32_t arg_a8h);
; var int32_t var_2dch @ stack - 0x2dc
; ...
; var LPCSTR upperFileName @ stack - 0x200
; ...
; var LPVOID var_4h @ stack - 0x4
; arg LPCSTR lpFileName @ stack + 0x4
; arg char **outFileName @ stack + 0x8 // Aside: why is `arg_8h` not updated? It should be the same thing, I renamed/retyped it.
; arg int32_t arg_a8h @ stack + 0xa8
0x00401a20 sub esp, 0x2a4
0x00401a26 push ebx
0x00401a27 push ebp
0x00401a28 push esi
0x00401a29 mov ebx, ecx
0x00401a2b push edi
0x00401a2c mov edi, dword [lpFileName]
0x00401a33 or ecx, 0xffffffff
0x00401a36 xor eax, eax
0x00401a38 repne scasb al, byte es:[edi]
0x00401a3a not ecx ; uVar9 = strlen(argv_4h). This reverse loop is a very strange way to write strlen, but I verified that it matches.
0x00401a3c sub edi, ecx
0x00401a3e lea edx, [upperFileName]
0x00401a45 mov eax, ecx
0x00401a47 mov esi, edi
0x00401a49 mov edi, edx
0x00401a4b mov dword [var_298h], ebx
0x00401a4f shr ecx, 2
0x00401a52 rep movsd dword es:[edi], dword ptr [esi]
0x00401a54 mov ecx, eax
0x00401a56 and ecx, 3
0x00401a59 rep movsb byte es:[edi], byte ptr [esi] ;
0x00401a5b lea ecx, [upperFileName]
0x00401a62 push ecx ; LPCSTR lpFileName
0x00401a63 call win32_private_toupper ; void win32_private_toupper(LPCSTR lpFileName)
0x00401a68 mov cl, byte [upperFileName]
0x00401a6f add esp, 4
0x00401a72 xor edi, edi
0x00401a74 xor eax, eax
0x00401a76 xor edx, edx
0x00401a78 mov dword [var_2a4h], eax
0x00401a7c test cl, cl
0x00401a7e je 0x401aea
0x00401a80 lea ecx, [arg_a8h] ; Rizin thinks this is a new variable (arg_a8h), but in reality it's upperFileName
So I resorted to objdump, and manually annotated all stack-manipulating instruction that I could find with the current position. The last LEA should then be: stack-0x2b4+0xb4 = stack-0x200, which is the position of upperFileName.
; undefined4::MoreReadFile
; calling convention: thiscall-ms-cdecl
; arguments:
; stack+0x4 = (const char *) lpFileName
; stack+0x8 = (char **) outFilename
; ecx = undefined4 (this is a thiscall, the this pointer is stored in ecx)
00401a20 <.text+0xa20>:
401a20: 81 ec a4 02 00 00 sub esp,0x2a4 ; esp = stack-0x2a4
401a26: 53 push ebx ; esp = stack-0x2a8
401a27: 55 push ebp ; esp = stack-0x2ac
401a28: 56 push esi ; esp = stack-0x2b0
401a29: 8b d9 mov ebx,ecx ; Reuse ebx...
401a2b: 57 push edi ; esp = stack-0x2b4
401a2c: 8b bc 24 b8 02 00 00 mov edi,DWORD PTR [esp+0x2b8] ; lpFileName
401a33: 83 c9 ff or ecx,0xffffffff
401a36: 33 c0 xor eax,eax
401a38: f2 ae repnz scas al,BYTE PTR es:[edi]
401a3a: f7 d1 not ecx
401a3c: 2b f9 sub edi,ecx
401a3e: 8d 94 24 b4 00 00 00 lea edx,[esp+0xb4] ; esp+0x0b4 = stack-0x200, &upperFileName[0]
401a45: 8b c1 mov eax,ecx
401a47: 8b f7 mov esi,edi
401a49: 8b fa mov edi,edx
401a4b: 89 5c 24 1c mov DWORD PTR [esp+0x1c],ebx ; esp+0x1c = stack-0x298; this
401a4f: c1 e9 02 shr ecx,0x2
401a52: f3 a5 rep movs DWORD PTR es:[edi],DWORD PTR ds:[esi]
401a54: 8b c8 mov ecx,eax
401a56: 83 e1 03 and ecx,0x3
401a59: f3 a4 rep movs BYTE PTR es:[edi],BYTE PTR ds:[esi]
401a5b: 8d 8c 24 b4 00 00 00 lea ecx,[esp+0xb4] ; esp = stack-0x2b4 = stack-0x200, &upperFileName[0]
401a62: 51 push ecx ; esp = stack-0x2b8
401a63: e8 d7 6d 00 00 call 0x40883f ; win32_private_toupper
401a68: 8a 8c 24 b8 00 00 00 mov cl,BYTE PTR [esp+0xb8] ; stack-0x2b8+0xb8 = stack-0x200, upperFileName[0]
401a6f: 83 c4 04 add esp,0x4 ; esp = stack-0x2b4
401a72: 33 ff xor edi,edi
401a74: 33 c0 xor eax,eax
401a76: 33 d2 xor edx,edx
401a78: 89 44 24 10 mov DWORD PTR [esp+0x10],eax ; stack-0x2b4+0x10 = stack-0x2a4 = 0x0
401a7c: 84 c9 test cl,cl
401a7e: 74 6a je 0x401aea
401a80: 8d 8c 24 b4 00 00 00 lea ecx,[esp+0xb4] ; stack-0x2b4+0xb4 = stack-0x200 = &upperFileName[0]
Given the above, it seems like both ESP and EBP based references can be wrong. Should I pull out the latter case into a separate bug? Something tells me the problems are related.
Both of these cases reproduce at current HEAD (266fe6b) too.
Is there any way I can debug the way Rizin calculates the variable? Why does it believe at 0x401a80 that esp+0xb4 == stack + 0xa8, despite it knowing (correctly) at 0x401a68 that esp+0xb8 == stack + 0x200?
Is there any sort of analysis I can force Rizin to do, to think harder, or try again? Changing/correcting the calling convention of the function does not appear to make a difference.
I have an abstract RzIL interpreter in a private repo which tracks those references and resolves them correctly. This could fix the problem by simply implementing a better method to track references.
But I have to deal with other stuff currently. So couldn't find the time to implement the plugin interface for the interpreter and integrate it into the function analysis loop. If someone wants to work on it ping me please.
Is it in a state where it can be tried out?
I don't know much about rizin development, but found it really easy to build (kudos on that, it's even easier than Neovim) so I'm willing to experiment a little.
Is there a reason the repo can't be opened for some experimentation? At least I could check whether the problem can be reproduced with it.
Separately: what's going wrong here seems so basic in nature that I'm surprised the issue exists at all. Is it perhaps specific to x86-32 PE binaries without frame pointers?
Is it in a state where it can be tried out?
Not plug and play like unfortunately. It tracks the memory references in an abstract state. BUt not more yet
Meaning:
- It executes a given path (vector of addresses).
- Tracks the value types (stack/heap/global) during execution.
- The base of the each value: For Heap, it is the address it was allocated, for stack it is the address of the procedure, for global it is a constant
- And, for each instruction which uses such an abstract value, the offset from base.
So the stack values of procedure A, are tracked as 〈𝑺 ⌊<procedure_addr_A>⌋, <offset>〉. And so forth. But those values are not yet transferred to Rizin or anything. And not added as references.
Also replacing operands in the asm-text (with arg_a8h or similar is not handled yet).
So it would be a task for someone interested in contributing to Rizin. And get rid of the general problem (see below). I think it might be 5-14 days of work maybe (highly dependent on previous experience).
The interpreter is written in Rust so Rust knowledge is useful (but not necessary if one knows other languages).
The interpreter is based on the one described in this paper (chapter 5). But quite a lot modified for RzIL.
Is there a reason the repo can't be opened for some experimentation?
Just didn't opened it yet because the whole algorithm I use it for is not done yet. But in case anyone wants to take a look, I am happy to add them. Let me know if you want to take a look.
Separately: what's going wrong here seems so basic in nature that I'm surprised the issue exists at all.
The fundamental problem is the old IL Rizin still uses for these tasks (ESIL) and the logic built on top of it. I had to fix similar issues as yours in the past and was not happy with the pretty old code I saw. The whole problem essentially needs to be implemented again. In a well defined manner and using our new IL RzIL. Which is more exhaustive and well defined compared to ESIL. But we haven't started yet to implement analysis algos on top of RzIL yet. Because other tasks have to be done before (not many left though).
The alternative to all this is to do a hotfix. Debugging this specific case, figure where the offset is incorrectly calculated and fix it.
But we currently push more for finishing RzIL and starting as quickly as possible with implementing new algorithms. Because this code will be removed anyways.
Thanks for the context. That's helpful.
And get rid of the general problem (see below). I think it might be 5-14 days of work maybe (highly dependent on previous experience).
This is unfortunately too much for me.
I assume the hotfix approach (I understand it will be thrown away) is less work. Further I assume the ESIL approach that exists now is shared with radare2. Perhaps it's been fixed there already and I can see about porting it, if that's desired (I'll check if the bug is present there).
In general, I'm looking forward to the new RzIL analysis. I generally like the improvements Rizin has made, especially w.r.t. projects. I used to lose my progress all the time. No longer.
Just didn't opened it yet because the whole algorithm I use it for is not done yet. But in case anyone wants to take a look, I am happy to add them.
I would love to take a look, if possible. I'm currently seeking an alternative to Ghidra, as it is extremely slow. However, I don't think the current approach of r2/rizin, which bases its analysis directly on native asm, is the best solution either. Probably RZIL is the solution?
@chf0x
Can you join our Mattermost and ping me there please? I send you an invite and give you an update what is done so far.
The basic analysis will be refactored as well (after https://github.com/rizinorg/rizin/issues/4738 is done). The relevant tracking issues are https://github.com/rizinorg/rizin/issues/2080 and https://github.com/rizinorg/rizin/issues/4736.
Lots of changes in Rizin since I reported this: #4738, the x86 parts of #2080 seem done, #4736 seems early. I thought I'd try again, by compiling rizin from HEAD and running.
The problem still reproduces:
$ ./rizin -A ./SDFextract.exe
> s fcn.00401a20
┌ fcn.00401a20(int32_t arg_4h, int32_t arg_8h, int32_t arg_a8h);
│ ; var int32_t var_2dch @ stack - 0x2dc
...
│ ; var HANDLE *hFile @ stack - 0x2c0
...
│ ; var int32_t var_200h @ stack - 0x200
...
│ ; arg int32_t arg_a8h @ stack + 0xa8
[0x00401a20]> pd 34
; CALL XREF from fcn.00401000 @ 0x401076
┌ fcn.00401a20(int32_t arg_4h, int32_t arg_8h, int32_t arg_a8h);
│ stack: 28 (vars 25, args 3)
│ rg: 0 (vars 0, args 0)
│ 0x00401a20 sub esp, 0x2a4
│ 0x00401a26 push ebx
│ 0x00401a27 push ebp
│ 0x00401a28 push esi
...
│ 0x00401a74 xor eax, eax
│ 0x00401a76 xor edx, edx
│ 0x00401a78 mov dword [var_2a4h], eax
│ 0x00401a7c test cl, cl
│ ┌─< 0x00401a7e je 0x401aea
│ │ 0x00401a80 lea ecx, [arg_a8h] ; <--- should be var_200h
We will do the analysis for the 0.9 release. This issue will definitely be added to the tests. We will close it with the corresponding PR when it is done. I assigned it to myself so I don't miss it.
Sounds great, thanks a lot. I'm excited to start (learning how to) RE again when 0.9 comes and will wait patiently until then.
I know the issue itself isn't resolved, but I recently recompiled rizin from head and looked around in a different part of this binary. Then I noticed that something appears to have regressed when it comes to deriving common/known types:
$ ./rizin/rizin -A ~/gc/SDFextract.exe
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls
[x] Analyze len bytes of instructions for references
[x] Analyze local variables and arguments
[x] Type matching analysis for all functions
[x] Applied 0 FLIRT signatures via sigdb
[x] Propagate noreturn information
[x] Check for classes
[x] Integrate dwarf function information.
[x] Resolve pointers to data sections
[x] Use -AA or aaaa to perform additional experimental analysis.
[0x00408706]> s fcn.00401a20
[0x00401a20]> pd
; CALL XREF from fcn.00401000 @ 0x401076
/ fcn.00401a20(unknown_t arg_4h, unknown_t arg_8h, unknown_t arg_a8h);
| ; var unknown_t var_2dch @ stack - 0x2dc
| ; var unknown_t var_2d5h @ stack - 0x2d5
| ; var unknown_t var_2d4h @ stack - 0x2d4
...
| ; var unknown_t var_4h @ stack - 0x4
| ; arg unknown_t arg_4h @ stack + 0x4
| ; arg unknown_t arg_8h @ stack + 0x8
| ; arg unknown_t arg_a8h @ stack + 0xa8
| 0x00401a20 sub esp, 0x2a4
| 0x00401a26 push ebx
| 0x00401a27 push ebp
| 0x00401a28 push esi
| 0x00401a29 mov ebx, ecx
| 0x00401a2b push edi
| 0x00401a2c mov edi, dword [arg_4h]
| 0x00401a33 or ecx, 0xffffffff ; 255
| 0x00401a36 xor eax, eax
| 0x00401a38 repne scasb
Previously, rizin would add deduced types like int32_t or HANDLE* to the stack variables. Now everything appears to be unknown_t.
I thought that maybe some analysis that previously happened at A is now happening only at higher levels. But printing the same after aaaa doesn't appear to change anything.
I looked around and found two possibly relevant issues: #3728 and #4618. But both of those predate the last known "working" state. Should I file a separate issue about this, or is it PEBCAK?
@aktau please open a separate issue. Also, if you are willing to do some testing, using git bisect to find the problematic commit or at least range of them would be extremely helpful.