syzkaller
syzkaller copied to clipboard
pkg/repro: don't replace the mmap at 0x20000000 with a different mmap
syzkaller repros seem to always create a mmap at address 0x20000000 as temporary space. That is not great since it makes the repros difficult to read, but the more specific issue I want to report is that it's apparently possible for syzkaller to replace the mmap at address 0x20000000 with a different one, e.g. the mmap of a file that is mounted as a filesystem via loopback. Later writes to the temporary space by the repro then actually corrupt this filesystem, creating a very strange repro. For an example of this, see the C reproducer of https://lore.kernel.org/linux-fsdevel/[email protected]/T/#u.
I'd assume it's possible to fix this by making syzkaller not generate mmap calls that use a fixed address range that overlaps with the fixed address range of the temporary space.
@a-nogikh FYI
(@dvyukov please correct me if I'm wrong)
I'd say it's actually not a single exceptional case that we can fix in target.Neutralize(), but rather a deliberate design decision in syzkaller.
syz-executor dedicates a single 16MB address range for the executed program to operate on. When syzkaller decides where to e.g. place call arguments or what address to pass to mmap, it will all be inside that same memory range. In this construct, mmaps inside the already mmapped memory are normal.
C reproducer has to emulate that same environment as good as it can to increase the chances of bug reproduction. For this very reason we also force all mmaps to be MAP_FIXED during fuzzing.
Yes, this is intentional to get better reproducibility and get more interesting interactions in this range between different mm syscalls and various uses of that memory.
Potentially program minimization/simplification during repro process could try to simplify this aspect at least to some degree (e.g. don't use the same page for non-anon mmap and syscall arguments if not necessary to reproduce the bug). But I am not sure how many interations it will require and how useful this is.
If it's a deliberate design decision, it's wrong. This makes reproducers far too hard to understand. Once the mapping gets replaced, syzkaller will add "syscalls" just to cause itself to copy data to the 0x20000000 mapping (and thus into some random underlying file, filesystem, device, etc.), and not for the actual syscalls. This results in really bizarre reproducers. I was eventually able to figure out that this was happening in https://syzkaller.appspot.com/x/repro.c?x=12b12928680000, but the vast majority of developers will just ignore or abandon reports like this as it's a waste of time.
Do I read the reproducer correctly that if we wouldn't write to mmap-ed range, we also wouldn't trigger the bug?
In this case, the underlying kernel design flaw (userspace can write to pagecache of mounted block device) has already been reported many times by syzbot. So the report provides no value, regardless of the reproducer.
But pretending it was a unique bug, yes the reproducer that syzbot found happens to rely on the scratch space mapping being replaced in order to reproduce the bug.
I'd expect that syzbot would find the same bug later with a simpler reproducer that doesn't rely on this weird quirk. It's certainly possible. If syzbot can't actually do it, then something needs to be fixed in that area.
Note that syzbot only sends the first reproducer to the mailing list, so that's the one that people look at. If that reproducer is incomprehensible, that is going to delay the bug being fixed.
To be super clear: writing to files is fine. The specific issue is how syzkaller maps a file over the scratch space, and then writes to that file as a side effect of preparing arguments for unrelated syscalls. This results in syzkaller generating a seemingly random syscall sequence just to cause itself to memcpy some data into the file's mapping. It could just copy data directly, skipping the unrelated system calls. Or simply write() to the file...