syzkaller executor: detect user-space memory/context corruptions

We could detect cases where kernel corrupts user-space memory/context.

For memory a basic scheme could be:

mmap a set of pages
check that they are all 0's
write some non-0 pattern
sleep
check that the pages still contain the pattern

This would catch stray writes in kernel. To catch more MM-related bugs we could also mprotect/mremap/remap_file_pages/etc these pages, also allocate anon pages or memfd pages.

Similarly to catch context corruptions we could setup a particular non-trivial context for a thread and check that it's preserved. The context should involve FPU/SSE/AVX and any special registers we can reach from user-spaces (fs/debug/etc, e.g. fldcw/fxrstor instructions). Can also throw in some exceptions like page faults, invalid instructions, etc.

2 interesting aspects:

How to make this reproducible?
Should it run in test executor process, or in separately forked executor process or fuzzer? Probably not in the fuzzer because of #1541. Running in the test executor process can make more interesting interactions with the random programs, but at the same time it looks almost impossible to protect from self-corruptions then.

Related to #200 ("executor: detect logical bugs").

Nov 21 '20 09:11 dvyukov

A recent example of FPU state corruption: https://lore.kernel.org/all/[email protected]/

May 10 '22 14:05 dvyukov

Do we know where these stray writes come from?

I guess the kernel cannot just make addresses up, it probably corrupts the pages that are actively used by syscalls. If so, writing canaries to adjacent pages may not detect anything.

Detecting corruptions of pages that the kernel is supposed to modify is a bit trickier, because we need to outline the expected working set of every syscall and only poison the unused parts of the page. But this is doable, as we are allocating the syscall arguments ourselves.

E.g. consider a program that reads a buffer from one file and writes it to another:

int bufsize = 16;
void *buf = malloc(bufsize);
read(fd1, buf, bufsize);
write(fd2, buf, bufsize);

To check for potential corruptions, syzkaller will need to do something along the lines of:

int bufsize = 16;
void *page = mmap(0, ...);
// placement_info stores information about used/unused holes within a page.
void *buf = allocate_within_page(page, bufsize, &placement_info);
poison_unused_memory(placement_info);
read(fd1, buf, bufsize);
check_poisoned_memory(placement_info);
write(fd2, buf, bufsize);
check_poisoned_memory(placement_info);
// optional sleep to detect delayed writes:
sleep(1);
check_poisoned_memory(placement_info);

Nov 16 '23 12:11 ramosian-glider

Precise detection of over-reads/over-writes is more the EFAULT part of #200. Even over-reads of what kernel is supposed to read are bad. Yes, we could detect these.

I see this issue as one level up: detect all corruptions that we did not detect earlier in a more precise way, regardless of their origin. E.g. also cover stray writes by background kernel threads, or page table setup bugs that corrupt user memory, or registers corruptions.

Nov 16 '23 12:11 dvyukov

Related kernel KASAN feature request that may help to detect some user-space corruptions: https://bugzilla.kernel.org/show_bug.cgi?id=218153

Nov 16 '23 12:11 dvyukov

More ideas:

Share the canary pages with an external process that will scan them and report an error when their contents change. (If made simple enough, such a process can be spawned for every executed program). It is unclear whether MAP_SHARED won't prevent certain types of corruptions.
Use an external tracing program that would trace the executor and scan the canary pages (the executor can mark them with PROT_NONE or some unused protection flag to simplify the communication).
That program can also snapshot the contents of the executor's memory before and after a syscall and compare them. We'll still need to somehow deduce the expected working set of the syscall from its description, but that is doable.

Nov 17 '23 09:11 ramosian-glider

FTR this may be a case of user-space corruption due to kernel bug in OpenBSD: https://syzkaller.appspot.com/bug?extid=0292611d290be27409bb https://groups.google.com/g/syzkaller-openbsd-bugs/c/tzNH3_Aa7fM/m/WVu1fvj_AwAJ

panic: time: Stop called on uninitialized Timer

goroutine 8605 [running]:
time.(*Timer).Stop(...)
	[/usr/local/go/src/time/sleep.go:79](https://github.com/openbsd/src/blob/7e284d508f03134ed914e01310f81a72255d0731//usr/local/go/src/time/sleep.go#L79)
github.com/google/syzkaller/pkg/ipc.(*command).exec.func1()
	[/syzkaller/gopath/src/github.com/google/syzkaller/pkg/ipc/ipc.go:796](https://github.com/openbsd/src/blob/7e284d508f03134ed914e01310f81a72255d0731//syzkaller/gopath/src/github.com/google/syzkaller/pkg/ipc/ipc.go#L796) +0xf5
created by github.com/google/syzkaller/pkg/ipc.(*command).exec in goroutine 44
	[/syzkaller/gopath/src/github.com/google/syzkaller/pkg/ipc/ipc.go:789](https://github.com/openbsd/src/blob/7e284d508f03134ed914e01310f81a72255d0731//syzkaller/gopath/src/github.com/google/syzkaller/pkg/ipc/ipc.go#L789) +0x22b

It happened just once and it's unclear how we could not initialize the timer here (and not notice it before): https://github.com/google/syzkaller/blob/f323435486123f331122c97cd8bd4183c89d4f05/pkg/ipc/ipc.go#L790-L796

Apr 09 '24 06:04 dvyukov

syzkaller syzkaller copied to clipboard

executor: detect user-space memory/context corruptions

syzkaller
syzkaller copied to clipboard