syzkaller executor: detect logical bugs

executor could detect a set of logical bugs in kernel on top of basic safety bugs detected by grepping console output. To the best of my knowledge logical kernel bugs are not detect by any other automated testing systems, so this could give us whole new plast of bugs. Examples of such checks:

process unexpectedly gets uid=0
an unexpected errno value returned from syscalls
unexpected side-effects (e.g. changing/non-changing mm/fs state)
unexpected syscall failure/success
bogus EFAULT result when kernel overreads/writes arguments
detect user-space memory/context corruptions (#2260)
detect unintentional kernel pointer leaks
execute prctl(PR_MDWE_REFUSE_EXEC_GAIN) and check that the process doesn't get any WX mappings

Taking into account complexity of kernel model these checks probably should be very conservative (give up predicting outcome when in doubt). But still it would be interesting to see if we can detect at least some bugs with conservative checks.

Complete model is close to impossible, so we need to aggressively limit scope initially and then incrementally extend it. This includes:

whitelisting syscalls, e.g. initially we start with just open/read/write/close
aggressively bailing out on anything unexpected, e.g. one syscalls returned ENOSPC, not checking this program
initially no concurrency at all and each syscall is given enough time to finish

Once we get some initial working base, we can start extending it in all directions. Obviously, more syscalls. But also maybe some limited concurrency, e.g. white/black-list of syscalls that can run concurrently (e.g. no close/write). Ultimately, it may be extremely interesting to test that 2 concurrent syscalls are atomic, i.e. result is equal either to one syscall executed first, then another, or vise versa (potential example bugs: 1, 2). This will probably need some blacklist too. But on the other hand, it does not require the second implementation. E.g. concurrent read/write on udp socket should always be atomic.

A related work on checking filesystems against POSIX model: SibylFS: formal specification and oracle-based testing for POSIX and real-world file systems.

A related idea that may be simpler to implement is to arrange some honeypots for the test process and then checking if the process is caught red-handed at these honeypots. Examples of honeypots:

a file in the working directory that the test process does not have permissions for (it may be file permissions, or LSM policy that denies access to the file), then checking if the file data or metadata was altered in any way
a network port that the test process must not be able to reach, then checking if the port has received any packets
a hash stored at known address in kernel memory, then checking if the test process was able to obtain the hash anywhere in the test process memory
if W^X protection is enabled (requires corresponding SELinux policy), check that the process does not get any WX mappings (previous vulnerabilities: 1, 2).

Another idea from Kit: Testing OS-level Virtualization for Functional Interference Bugs paper to detect "functional interference bugs in OS-virtualization mechanisms, such as Linux namespaces. The key idea of Kit is to detect inter-container functional interference by comparing the system call traces of a container across two executions, where it runs with and without the preceding execution of another container".

Related issues: #5382

May 29 '17 14:05 dvyukov

One other idea is testing the behaviour of a new kernel against behaviour of an old kernel.

There would need to be a way to identify or mark behaviour that is expected to change. Alternatively, we could restrict checking behaviour of kernel APIs that are guaranteed to be stable and should not change.

There are still interesting problems related how to minimize non-determinism of the result of a randomly generated test program. Some ideas:

Limit test program generation to those where non-deterministic behaviour is excluded.
Run the test program several times and collect sets of outputs; then, if the set of outputs from the old and new kernel match, it would also be possible to conclude that behaviour matches. There might still be differences if the output depends on hidden system state, such as hardware clocks, or other timing-dependent state that is impossible to control in a normal system; these would need to be avoided.
Run the kernels in a full-system simulator (e.g. Bochs), where the full-system state can be controlled completely.

May 24 '19 09:05 melver

FWIW Dirty Pipe could have been found by file honeypots.

Mar 09 '22 09:03 dvyukov