binee icon indicating copy to clipboard operation
binee copied to clipboard

incorrect execution of GetLastError, wrong set of assembly instructions executed

Open mewmew opened this issue 4 years ago • 11 comments

I was running binee last night, and on one of my samples the invocation of GetLastError would result in incorrect execution (eventually resulting in an invalid read).

I used IDA to verify the implementation of GetLastError, and for some reason, binee is executing another set of assembly instructions, rather than the actual assembly instructions of the GetLastError function (as present in kernel32.dll).

To help troubleshoot, I've created a minimal test case (b.exe). Find source code, build instructions and binary attached below.

extern _SetLastError@4
extern _GetLastError@0
extern _ExitProcess@4

global _main

[section .text]

_main:
	push 42
	call _SetLastError@4
	call _GetLastError@0
	push eax
	call _ExitProcess@4
# assemble b.asm into 32-bit PE object file.
$ nasm -f win32 -o b.obj b.asm

# link object file with kernel32.lib to create executable
$ wine ~/VS6/VC98/Bin/LINK.EXE /OUT:b.exe /ENTRY:main /subsystem:console /machine:i386 /LIBPATH:${HOME}/VS6/VC98/LIB /nologo b.obj kernel32.lib

# run binary to check result of GetLastError
$ wine b.exe ; echo $?
42

Results of running binee on b.exe:

$ ~/Desktop/binee/binee -v -d -c win.yaml b.exe
[1] 0x00401000: push 0x2a
[1] 0x00401002: call 0x10
[1] 0x00401012: jmp dword ptr [0x402008]
[1] 0x2025e2d3: F kernel32.dll:SetLastError(dwErrCode = 0x2a) = 0xb0010000
[1] 0x00401007: call 0x11
[1] 0x00401018: jmp dword ptr [0x402000]
[1] 0x2025e0bf: P kernel32.dll:GetLastError() = 0xb0010000
[1] 0x2025e0bf: dec esi
[1] 0x2025e0c0: push esp
[1] 0x2025e0c1: inc esp
[1] 0x2025e0c2: dec esp
[1] 0x2025e0c3: dec esp
[1] 0x2025e0c4: push edx
[1] 0x2025e0c6: je 0x6e
[1] 0x2025e0c8: inc edi
[1] 0x2025e0c9: je 0x4f
[1] 0x2025e0cc: popal 
[1] 0x2025e0cd: jae 0x76
[1] 0x2025e143: je 0x4a
[1] 0x2025e145: popal 
[1] 0x2025e147: add byte ptr fs:[esi + 0x54], cl
[1] 0x2025e14b: inc esp
[1] 0x2025e14c: dec esp
[1] 0x2025e14d: dec esp
[1] 0x2025e14e: push edx
[1] 0x2025e150: je 0x6e
[1] 0x2025e152: dec ecx
[1] 0x2025e153: outsb dx, byte ptr [esi]
[1] 0x2025e154: je 0x67
[1] 0x2025e156: jb 0x6e
[1] 0x2025e158: outsd dx, dword ptr [esi]
[1] 0x2025e159: arpl word ptr [ebx + 0x65], bp
[1] 0x2025e15c: inc esi
[1] 0x2025e15e: insb byte ptr es:[edi], dx
[1] 0x2025e15f: jne 0x75
[1] 0x2025e1d4: push ecx
[1] 0x2025e1d5: jne 0x67
[1] 0x2025e23c: outsd dx, dword ptr [esi]
[1] 0x2025e23d: outsb dx, byte ptr [esi]
[1] 0x2025e23e: je 0x67
[1] 0x2025e240: js 0x76
[1] 0x2025e242: add byte ptr [esi + 0x54], cl
[1] 0x2025e245: inc esp
[1] 0x2025e246: dec esp
[1] 0x2025e247: dec esp
[1] 0x2025e248: push edx
[1] 0x2025e24a: je 0x6e
[1] 0x2025e24c: inc ebx
[1] 0x2025e24d: popal 
[1] 0x2025e24e: jo 0x76
[1] 0x2025e250: jne 0x74
[1] 0x2025e2c4: arpl word ptr [ecx + ebp*2 + 0x6f], si
[1] 0x2025e2c8: outsb dx, byte ptr [esi]
Invalid Read: address = 0xffffff00, size = 0x1, value = 0x0

Contents of win.yaml:

$ cat win.yaml
root: "/home/u/_share_/xp/

b.exe attachment: b.tar.gz

Reference disassembly of GetLastError from IDA:

.text:7C830759 ; DWORD __stdcall GetLastError()
.text:7C830759 _GetLastError@0 proc near               ; CODE XREF: GetComputerNameExW(x,x,x):loc_7C820174↑p
.text:7C830759                                         ; GetComputerNameExW(x,x,x):loc_7C82018B↑p ...
.text:7C830759                 mov     eax, large fs:18h
.text:7C83075F                 mov     eax, [eax+34h]
.text:7C830762                 retn
.text:7C830762 _GetLastError@0 endp

Edit: note that the return value of GetLastError reported by binee is incorrect ([1] 0x2025e0bf: P kernel32.dll:GetLastError() = 0xb0010000), it should be 0x2a not 0xb0010000). Furthermore, not that the first assembly instruction of GetLastError is reported by binee as [1] 0x2025e0bf: dec esi, but should be .text:7C830759 mov eax, large fs:18h.

mewmew avatar Nov 22 '19 14:11 mewmew

I've implemented a full hook for GetLastError in commit https://github.com/mewpull/binee/commit/5e00ab97e4e5e7f361bbe2ec2c7d017fe29370e0.

However, before submitting a PR for this, we should try to assess what the deeper root cause of this issue is, as the incorrect set of assembly instructions are indeed execution when invoking GetLastError (where do these assembly instructions even come from? I did a search for popal in kernel32.dll but could not find any such instruction). This is a serious problem, as it would lower the confidence that the results we arrive at through analysis are correct.

mewmew avatar Nov 22 '19 15:11 mewmew

I agree that a full hook is not necessary for this function in the long term and the root cause should be found.

Actually, I have a sample that depends on a full hook; where the sample sets an error and then later expects it to be set. Otherwise, it exists early without any of the interesting code being executed. Since SetLastError is already implemented, providing the mirror code of GetLastError makes sense.

However, we do need to figure out the root cause of this issue, as it's unnerving with inaccurate emulation (or disassembly).

I can hook up another disassembler to verify if this is a Capstone issue. (I don't think it is, but definitely worth to verify.)

mewmew avatar Nov 23 '19 16:11 mewmew

I don't think it is capstone because return is incorrect, it was just the first thing that came to mind. Although, the gapstone lib needs to be upgraded to support capstone4 which I just submitted a PR for. I can't test it against my local data set until Monday.

kgwinnup avatar Nov 23 '19 16:11 kgwinnup

Ok, I just verified, this is not an issue with Capstone.

Debug output of binee -v b.exe using Capstone:

u@x220 ~/D/foo> ~/Desktop/binee/binee -v -d -c win.yaml b.exe
[1] 0x00401000: push 0x2a
[1] 0x00401002: call 0x10
[1] 0x00401012: jmp dword ptr [0x402008]
[1] 0x2025e2d3: F kernel32.dll:SetLastError(dwErrCode = 0x2a) = 0xb0010000
[1] 0x00401007: call 0x11
[1] 0x00401018: jmp dword ptr [0x402000]
[1] 0x2025e0bf: P kernel32.dll:GetLastError() = 0xb0010000
[1] 0x2025e0c0: push esp
[1] 0x2025e0c1: inc esp
[1] 0x2025e0c2: dec esp
[1] 0x2025e0c3: dec esp
[1] 0x2025e0c4: push edx
[1] 0x2025e0c6: je 0x6e
[1] 0x2025e0c8: inc edi
[1] 0x2025e0c9: je 0x4f
[1] 0x2025e0cc: popal 
[1] 0x2025e0cd: jae 0x76
[1] 0x2025e143: je 0x4a
[1] 0x2025e145: popal 
[1] 0x2025e147: add byte ptr fs:[esi + 0x54], cl
[1] 0x2025e14b: inc esp
[1] 0x2025e14c: dec esp
[1] 0x2025e14d: dec esp
[1] 0x2025e14e: push edx
[1] 0x2025e150: je 0x6e
[1] 0x2025e152: dec ecx
[1] 0x2025e153: outsb dx, byte ptr [esi]
Invalid Read: address = 0x0, size = 0x1, value = 0x0

Debug output of binee -v b.exe using the native Go disassembler of golang.org/x/arch/x86/x86asm:

u@x220 ~/D/foo> ~/Desktop/binee/binee -v -d -c win.yaml b.exe
[1] 0x00401000: PUSH 0x2a
[1] 0x00401002: CALL .+11
[1] 0x00401012: JMP [+0x402008]
[1] 0x2025e2d3: F kernel32.dll:SetLastError(dwErrCode = 0x2a) = 0xb0010000
[1] 0x00401007: CALL .+12
[1] 0x00401018: JMP [+0x402000]
[1] 0x2025e0bf: P kernel32.dll:GetLastError() = 0xb0010000
[1] 0x2025e0c0: PUSH ESP
[1] 0x2025e0c1: INC ESP
[1] 0x2025e0c2: DEC ESP
[1] 0x2025e0c3: DEC ESP
[1] 0x2025e0c4: CS PUSH EDX
[1] 0x2025e0c6: JE .+108
[1] 0x2025e0c8: INC EDI
[1] 0x2025e0c9: GS JE .+76
[1] 0x2025e0cc: POPAD
[1] 0x2025e0cd: JAE .+116
[1] 0x2025e143: JE .+72
[1] 0x2025e145: GS POPAD
[1] 0x2025e147: ADD [ESI+0x54], CL
[1] 0x2025e14b: INC ESP
[1] 0x2025e14c: DEC ESP
[1] 0x2025e14d: DEC ESP
[1] 0x2025e14e: CS PUSH EDX
[1] 0x2025e150: JE .+108
[1] 0x2025e152: DEC ECX
[1] 0x2025e153: OUTSB DX, [ESI]
Invalid Read: address = 0x0, size = 0x1, value = 0x0

The following patch was applied to rev d33dd0fad70eaad50fe839107386f07a78fa76ed to use the native Go disassembler for x86 instead of Capstone.

diff --git a/windows/hooks.go b/windows/hooks.go
index 3881552..862a0ac 100644
--- a/windows/hooks.go
+++ b/windows/hooks.go
@@ -11,6 +11,7 @@ import (
        "github.com/carbonblack/binee/util"
 
        uc "github.com/unicorn-engine/unicorn/bindings/go/unicorn"
+       "golang.org/x/arch/x86/x86asm"
 )
 
 func (emu *WinEmulator) LoadHooks() {
@@ -267,8 +268,9 @@ func (self *Instruction) Address() string {
 
 func (self *Instruction) Disassemble() string {
        buf, _ := self.emu.Uc.MemRead(self.Addr, uint64(self.Size))
-       if inst, err := self.emu.Cs.Disasm(buf, 0, uint64(self.Size)); err == nil {
-               return fmt.Sprintf("%s %s", inst[0].Mnemonic, inst[0].OpStr)
+       mode := int(8*self.emu.PtrSize)
+       if inst, err := x86asm.Decode(buf, mode); err == nil {
+               return inst.String()
        }
        return ""
 }

mewmew avatar Nov 23 '19 16:11 mewmew

I don't think it is capstone because return is incorrect, it was just the first thing that came to mind. Although, the gapstone lib needs to be upgraded to support capstone4 which I just submitted a PR for. I can't test it against my local data set until Monday.

It may be worth considering to use the native Go disassembler for x86, instead of Gapstone, or at least to evaluate pros/cons. This is the disassembler used by the Go compiler and toolchain.

https://godoc.org/golang.org/x/arch/x86/x86asm

For a patch to use x/arch/x86 instead of Gapstone, see the diff in the above comment.

mewmew avatar Nov 23 '19 16:11 mewmew

@kgwinnup, one quick question. How does binee translate between addresses of dynamically linked libraries and the address used at runtime (e.g. 0x2025e0bf in the example below)? Essentially, how is the address of where to load DLLs determined by binee and where is this mapping kept? I would like to translate the address back into the "original" address of the DLL, so I can see where this code is located in IDA.

[1] 0x00401018: jmp dword ptr [0x402000]
[1] 0x2025e0bf: P kernel32.dll:GetLastError() = 0xb0010000
[1] 0x2025e0c0: push esp
[1] 0x2025e0c1: inc esp

mewmew avatar Nov 23 '19 16:11 mewmew

It may be worth considering to use the native Go disassembler for x86, instead of Gapstone, or at least to evaluate pros/cons. This is the disassembler used by the Go compiler and toolchain.

I had no idea this even existed. I agree, this is definitely worth using over gapstone. I would like to keep dependencies to a minimum.

kgwinnup avatar Nov 23 '19 16:11 kgwinnup

one quick question. How does binee translate between addresses of dynamically linked libraries and the address used at runtime

I am curious of your thoughts on this.

This all starts at the location below, and i'll try and give a short explanation of the process.

https://github.com/carbonblack/binee/blob/master/windows/loader.go#L787

First the PE file's imports are looped over and each DLL is loaded recursively and stored into a map in WinEmulator. There are set of maps that have mappings from Name->Hook, Addr->Hook, and Dll->RealDllName

https://github.com/carbonblack/binee/blob/master/windows/winemulator.go#L82

At this time all the imports names are resolved into a map (see link above) which is later used for Hook lookups. Before each DLL is loaded into unicorn memory, the image address needs to be updated. Before this starts, there is a "counter" called NextLibAddress (located as a property of WinEmulator) that always starts at a static location based on whether the PE is 32 or 64 bit. This is in the WinEmulator.Load (previously New) function.

After each DLL is loaded from host disk, and before mapped into memory, the image address needs to be updated (this also updates other properties of the PE file). This is always the NextLibAddress and after mapping into unicorn, the NextLibAddress is incremented to point to the next available location for a DLL to be mapped in.

https://github.com/carbonblack/binee/blob/master/windows/loader.go#L227 https://github.com/carbonblack/binee/blob/master/windows/winemulator.go#L87

kgwinnup avatar Nov 23 '19 17:11 kgwinnup

I would like to translate the address back into the "original" address of the DLL, so I can see where this code is located in IDA.

I don't know if this is going to be possible without implementing something specific to mapping a function->original address.

My workflow for this, while not the best, is to lookup the function manually in whatever RE tool (I use radare2) and then jumping to the real implementation.

I am wondering if it would be useful to have a map as your looking for, and displaying that address instead of the temp emulator address. The temp address 0x2XXXXXXXX is not particular useful in anyway right now. Maybe later if debugger support is added.

Possibly a good place to capture the original function and DLL base address is:

https://github.com/carbonblack/binee/blob/master/windows/loader.go#L227

kgwinnup avatar Nov 23 '19 17:11 kgwinnup

I don't know if this is going to be possible without implementing something specific to mapping a function->original address.

I see. Well, it should be relatively easy for users to simply binary search for the address to determine the original library, and then check the delta against the base address and thus compute the "original" address using this information.

My workflow for this, while not the best, is to lookup the function manually in whatever RE tool (I use radare2) and then jumping to the real implementation.

That's my workflow as well. And it works great, except for the case of GetLastError, as the code emulated by binee for this function is not the one listed in IDA.

I am wondering if it would be useful to have a map as your looking for, and displaying that address instead of the temp emulator address. The temp address 0x2XXXXXXXX is not particular useful in anyway right now. Maybe later if debugger support is added.

Displaying the original address would indeed be very useful! Then you can just check what DLL the code is running in and jump directly to the address of any given instruction in say IDA/radare2.

mewmew avatar Nov 23 '19 21:11 mewmew

Oh, and thanks for the write-up. I appreciate it.

mewmew avatar Nov 23 '19 21:11 mewmew