winafl
winafl copied to clipboard
Cannot kill child process
While fuzzing on a 24 core machine, every couple of hours afl-fuzz process crashes with the following message:
[-] PROGRAM ABORT : Cannot kill child process
Location : destroy_target_process(), C:\work\fuzzing\winafl\afl-fuzz.c:2385
I have an open WinDBG windows with the following crash:
Microsoft (R) Windows Debugger Version 10.0.10586.567 X86
Copyright (c) Microsoft Corporation. All rights reserved.
*** wait with pending attach
************* Symbol Path validation summary **************
Response Time (ms) Location
Deferred symsrv*symsrv.dll*C:\WINDOWS\Symbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: symsrv*symsrv.dll*C:\WINDOWS\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
ModLoad: 013c0000 013c7000 R:\crash_test.exe
ModLoad: 770c0000 7724d000 C:\WINDOWS\SYSTEM32\ntdll.dll
ModLoad: 76cd0000 76da0000 C:\WINDOWS\System32\KERNEL32.DLL
ModLoad: 758f0000 75ac7000 C:\WINDOWS\System32\KERNELBASE.dll
ModLoad: 75b80000 75c97000 C:\WINDOWS\System32\ucrtbase.dll
ModLoad: 73ab0000 73ac5000 C:\WINDOWS\SYSTEM32\VCRUNTIME140.dll
Break-in sent, waiting 30 seconds...
WARNING: Break-in timed out, suspending.
This is usually caused by another thread holding the loader lock
(1f7c.17c8): Wake debugger - code 80000007 (first chance)
eax=00000000 ebx=00c90000 ecx=00000000 edx=00000000 esi=0055f534 edi=00c90000
eip=73808c66 esp=0055f4d4 ebp=00000002 iopl=0 nv up ei pl nz na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206
73808c66 8b0c24 mov ecx,dword ptr [esp] ss:002b:0055f4d4=73821c69
0:000> kb
# ChildEBP RetAddr Args to Child
WARNING: Frame IP not in any known module. Following frames may be wrong.
00 0055f4d0 73821c69 ffffffff 00c97000 00000000 0x73808c66
01 0055f4f0 73822db8 ffffffff 00c97000 0055f534 0x73821c69
02 0055f50c 7381133d 00c97000 0055f534 0000001c 0x73822db8
03 0055f54c 7380bf25 00c9b000 0055f568 00000001 0x7381133d
04 00000000 00000000 00000000 00000000 00000000 0x7380bf25
0:000> u eip
73808c66 8b0c24 mov ecx,dword ptr [esp]
73808c69 894c24fc mov dword ptr [esp-4],ecx
73808c6d 8d6424fc lea esp,[esp-4]
73808c71 c3 ret
73808c72 8da424e8feffff lea esp,[esp-118h]
73808c79 6a00 push 0
73808c7b 9c pushfd
73808c7c 60 pushad
From what I can tell the debugger is having a hard time attaching ("waiting 30 seconds...") which means the process has the loader lock held (and cannot inject the debugger thread to the process). I'm not sure why it's happening.
I'm using DynamoRIO 7.0.17595-0 fuzzing a 32bit process on a Windows 10 1709 (16299).
I've seen this happen before but not with such frequency (it was a matter of days and not hours for me). Possibly it depends on the target, but I don't really know the cause.
Can you see if it's any better with DynamoRIO 6.2.0-2?
TL;DR - upgrading to cronbuild-7.0.17605
seems to solve the issue.
I tested DynamoRIO 6.2.0-2 and this problem reproduced along with another one which is clearly a bug in the instrumentation:
Copyright (c) Microsoft Corporation. All rights reserved.
*** wait with pending attach
************* Path validation summary **************
Response Time (ms) Location
Deferred srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
ModLoad: 00c10000 00c17000 R:\crash_test.exe
ModLoad: 77030000 771bd000 C:\WINDOWS\SYSTEM32\ntdll.dll
ModLoad: 75b00000 75bd0000 C:\WINDOWS\System32\KERNEL32.DLL
ModLoad: 74470000 74647000 C:\WINDOWS\System32\KERNELBASE.dll
ModLoad: 74890000 749a7000 C:\WINDOWS\System32\ucrtbase.dll
ModLoad: 6f100000 6f115000 C:\WINDOWS\SYSTEM32\VCRUNTIME140.dll
(4fc.850): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=770da650 ebx=00000000 ecx=00000000 edx=00000000 esi=00000000 edi=00000000
eip=6efdd835 esp=0307f704 ebp=00000000 iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202
6efdd835 54 push esp
0:003> u eip
6efdd835 54 push esp
6efdd836 6803000000 push 3
6efdd83b 8da424e8feffff lea esp,[esp-118h]
6efdd842 c5fe7f442418 vmovdqu ymmword ptr [esp+18h],ymm0
6efdd848 c5fe7f4c2438 vmovdqu ymmword ptr [esp+38h],ymm1
6efdd84e c5fe7f542458 vmovdqu ymmword ptr [esp+58h],ymm2
6efdd854 c5fe7f5c2478 vmovdqu ymmword ptr [esp+78h],ymm3
6efdd85a c5fe7fa42498000000 vmovdqu ymmword ptr [esp+98h],ymm4
0:003> !address eip
Building memory map: 00000000
Mapping file section regions...
Mapping module regions...
Mapping PEB regions...
Mapping TEB and stack regions...
Mapping heap regions...
Mapping page heap regions...
Mapping other regions...
Mapping stack trace database regions...
Mapping activation context regions...
Usage: <unknown>
Base Address: 6efad000
End Address: 6efea000
Region Size: 0003d000 ( 244.000 kB)
State: 00001000 MEM_COMMIT
Protect: 00000004 PAGE_READWRITE
Type: 01000000 MEM_IMAGE
Allocation Base: 6eea0000
Allocation Protect: 00000080 PAGE_EXECUTE_WRITECOPY
Content source: 1 (target), length: c7cb
As you can see, the page we jump to (which is instrumentation code) doesn't have execute permissions.
I tried running WinAFL with the latest DynamoRIO build cronbuild-7.0.17605
and it seems to solve the problem. There are 695 commits in between release_6_2_0..origin/master, I skimmed through the commit messages but couldn't find something indicative for solving both crashes.
Thanks for the info, that's good to know!
Not sure if related, but I ran into this as well. Turned out get_test_case
was missing a FD close:
$ git diff afl-fuzz.c
diff --git a/afl-fuzz.c b/afl-fuzz.c
index 28ec379..7b4e195 100644
--- a/afl-fuzz.c
+++ b/afl-fuzz.c
@@ -2539,6 +2539,9 @@ char *get_test_case(long *fsize)
char *buf = malloc(*fsize);
ck_read(fd, buf, *fsize, "input file");
+ if(out_file != NULL)
+ close(fd);
+
return buf;
}
This fixed the issue for me.
Hi @hatRiot, thanks you very much for the heads up - that indeed looks like a bug in get_test_case. I applied your patch.
However, note that get_test_case is only called from process_test_case_into_dll which is only used if a custom sample processing dll is used, so this can only be the root cause if you are using a custom dll (-l flag) and custom output file (-f flag).