syzkaller
syzkaller copied to clipboard
sys/linux: syz-describe: auto generate syzlang
Sorry that I wrongly close the PR #3129.
I have made some modifications based on your comments. Now just push one example of KVM for more comments. The modifications are below:
- copyright headers and the kernel version at the beginning.
- debug info about where the tool finds drivers, where the tool finds the device file names and syscall handler structures
- a prefix
syz_describe_
for syscalls - add
meta arches["amd64"]
based ontarget triple
field in LLVM bitcode - change the file name to
syz_describe_ + "module init function name" + "_arche"(e.g., _amd64) + .txt
, I think the module init functions should rarely change, which should be ok for the file name. - fix bug that it should use
syz_open_dev
instead ofopenat
And I am trying to add some debug info for the structure definition.
In fact, I have trouble with the name of structure.
For example, the KVM in svm_init
(amd) and vmx_init
(intel) share many same structures.
If we want to keep them in different syscall description, e.g., syz_describe_svm_init_amd64.txt
and syz_describe_vmx_init_amd64.txt
, I have to add some other strings to those structures.
Or we can only keep one record of them.
Which one do you think is better?
Codecov Report
Merging #3143 (e281a2f) into master (0042f2b) will decrease coverage by
0.0%
. The diff coverage isn/a
.
Impacted Files | Coverage Δ | |
---|---|---|
prog/mutation.go | 88.1% <0.0%> (-1.2%) |
:arrow_down: |
prog/any.go | 84.8% <0.0%> (+0.8%) |
:arrow_up: |
prog/size.go | 92.4% <0.0%> (+1.9%) |
:arrow_up: |
For example, the KVM in svm_init(amd) and vmx_init(intel) share many same structures. If we want to keep them in different syscall description, e.g., syz_describe_svm_init_amd64.txt and syz_describe_vmx_init_amd64.txt, I have to add some other strings to those structures.
Are these different structures? Or the same struct reached from both svm_init and vmx_init? If it's the same struct then it's definitely better to keep just one of them. But I am not sure what file should it go into then. Does syz-describe do global analysis and know about the duplicate?
Note that init functions can have duplicate names as well. Since they are static, they all can be called just init().
Are these different structures? Or the same struct reached from both svm_init and vmx_init? If it's the same struct then it's definitely better to keep just one of them. But I am not sure what file should it go into then. Does syz-describe do global analysis and know about the duplicate?
They are the same structures reached from both svm_init and vmx_init. And I rewrote the related code to generate syzlang format syscall descriptions last week and now syz-describe could know the duplicate. So we can put it anywhere we want. I guess it is also reasonable to put them in different file. In the future, we can generate values or some other information for the arguments, and in that case, the same structures with different values would also be different. (I remember I saw this feature in syzkaller, but I cannot find this, please point out my mistakes if this does not exist)
Note that init functions can have duplicate names as well. Since they are static, they all can be called just init().
Currently, I have not found any module init functions with duplicate name. But I agree that it is possible to have duplicate name. How about using the path and init function name together? But for external modules not from the linux kernel (*.ko), I am not sure whether this name would be ok. But I guess we do not need to consider the name of external modules now.
Currently, I have not found any module init functions with duplicate name. But I agree that it is possible to have duplicate name. How about using the path and init function name together?
I would use just file name. Or can a single file contain 2 init functions?
the same structures with different values would also be different. (I remember I saw this feature in syzkaller, but I cannot find this, please point out my mistakes if this does not exist)
Yes, it exists, one can totally have 2+ versions of the same struct.
One can just have 2 versions with completely arbitrary names. Say, if original struct is called foo
, one can have 2 versions of it with different consts or anything else called bar
and baz
.
However, the official recommended way to name them is foo$bar
and foo$baz
. This allows them to automatically understand that these both names actually refer to the same kernel struct foo
. E.g. syz-check uses this to do static checking of descriptions:
https://github.com/google/syzkaller/blob/master/tools/syz-check/check.go
Actually a single file may need several versions of the same struct as wel.
For example, there is 2 ioctls, they accept the same struct, but the struct contains type
field which needs to have constant value1 for the first ioctl and value2 for the second ioctl.
If it's the same struct then it's definitely better to keep just one of them.
Actually I take this back. Since this is auto-generated descriptions we can have multiple copies if it's better for other reasons.
make two changes:
- use prefix + path of source code as the file name of generated syscall description
- use hash to avoid the repeat names of structures or syscalls, and the hash value is from the debug info about where the tool finds drivers, which is existing in the file.
I run make presubmit
locally with no error. However, there is one in ci
.
I run make presubmit locally with no error. However, there is one in ci.
That's a OOM flake, I've restarted CI.
We've discussed this internally and we think the first reasonable and uncontroversial step would be to commit automatic descriptions for interfaces for which we don't have manual descriptions.
Can you somehow share the full list of interfaces the tool can generate? Not sure what's the best way to do this. Either upload all files somewhere, or maybe just the list openat/syz_open_dev calls.
I think this may start uncovering some scalability issues in syzkaller, which we can address early on. Tighter integration will probably require open-sourcing the tool itself, since we will need to be able to reproduce the results and be able to make some adjustments.
Btw, the tool compiles the kernel and thus the generated descriptions depend on the kernel config used, right? What config did you use? If you used very small config (defconfig), then you may not get all interfaces. There is syzbot config which is reasonably beefy and also that's what syzbot tests: https://github.com/google/syzkaller/blob/master/dashboard/config/linux/upstream-apparmor-kasan.config However, it does not have some things enabled simply because we did not have descriptions for them. So if we can auto-generate we could enable more configs there.
Btw, the tool compiles the kernel and thus the generated descriptions depend on the kernel config used, right?
Yes, the results depend on the kernel config used. Currently we separate the phase of build LLVM bitcode and the phase of analysis. So we can evaluate our tool with the bitcode including some modules, not the bitcode of the whole kernel. However, let me try to run the tool directly on the kernel with allyesconfig.
I created a new repo that includes the results generated by syzdescribe (https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe). There are some performance issues for the whole kernel of allyesconfig and I am trying to solve them. So I uploaded some finished results for allyesconfig. Notice, there are still some FP & FN for device file names.
Started looking. I assumed linux-v5.12-def (defconfig?) is a subset of linux-v5.12-allyes, but first 3 files I see in linux-v5.12-def are not present in linux-v5.12-allyes: syz_describe_arch_x86_kernel_cpuid_c_amd64.txt syz_describe_arch_x86_kernel_msr_c_amd64.txt syz_describe_block_bsg_c_amd64.txt
Also matching files show some strange diffs: https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-def/syz_describe_drivers_block_loop_c_amd64.txt https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_block_loop_c_amd64.txt
E.g. lots of unrelated open calls in linux-v5.12-def, and no struct definitions in linux-v5.12-allyes (ioctl args are just intptr's).
Looking at: https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_xen_evtchn_c_amd64.txt all ioctl's have intptr arguments. I understand there may be some false negatives, but both ioctl code looks pretty idiomatic and does typed copy_from_user in each case: https://elixir.bootlin.com/linux/v5.12/source/drivers/xen/evtchn.c#L449 and const definitions include types: https://elixir.bootlin.com/linux/v5.12/source/include/uapi/xen/evtchn.h#L42 Do you know what happened there? This one may be reachable in virtual machines, but w/o types it may be not too useful.
Btw, I would suggest to generate ptr[in, array[int8]]
instead of intptr
when we can't extract the type b/c in most cases it's a pointer, while for intptr the fuzzer will pass not dereferencable values in most cases.
I continued looking for some descriptions that are reachable in VMs and that we could commit.
I noticed lots don't have any ioctls, and just open's won't give lots of coverage: https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_block_floppy_c_amd64.txt#L31 https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_block_virtio_blk_c_amd64.txt#L23 https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_char_virtio_console_c_amd64.txt#L53 https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_block_umem_c_amd64.txt#L24
Also I noticed lots have some strange strings in open's, like "/dev/#", "/dev/%s", "/dev/": https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_block_floppy_c_amd64.txt#L19 If the tool tends to produce them, they probably can be just filtered out.
Here is an interesting name "/dev/inputu": https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_input_evdev_c_amd64.txt#L21 In the source code it's "input%lu": https://elixir.bootlin.com/linux/v5.12/source/drivers/input/input.c#L1924 The tools probably needs to handle %lu as %d.
Also I noticed some devices are placed into the root /dev dir, while they are actually located in sub-dirs. E.g. "/dev/event#" is actually "/dev/input/event#"; "/dev/mouse#" should be "/dev/input/mouse#" and "/dev/tun" should be "/dev/net/tun".
In some cases direction is mis-detected. E.g. ptr[in, struct_hidraw_devinfo_13102764037666289071]
:
https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_hid_hid-core_c_amd64.txt#L38
but it's ptr[out]
:
https://elixir.bootlin.com/linux/v5.12/source/include/uapi/linux/hidraw.h#L36
https://elixir.bootlin.com/linux/v5.12/source/drivers/hid/hidraw.c#L408
Also in this file "/dev/hiddev#" and "/dev/hidraw#" return the same resource type fd_4641101649349984127_fd
https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_hid_usbhid_hid-core_c_amd64.txt
While in the kernel sources they seem to be unrelated (different interfaces).
Maybe you can suggest some suitable description file that is reachable in VMs and that is missing in our current descriptions?
I am still working on the performance issues for allyesconfig and also trying to solve the issues you mentioned.
But for the suitable description files, there are some in below, although some of they also miss the ioctls for some unknown reasons in this version(they exist in previous version). Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_isdn_capi_capi_c_amd64.txt Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_isdn_mISDN_core_c_amd64.txt Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_vhost_net_c_amd64.txt Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_vhost_vsock_c_amd64.txt Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_misc_vmw_vmci_vmci_driver_c_amd64.txt Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_char_hpet_c_amd64.txt Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_char_nvram_c_amd64.txt
These seem to have the same issue: no ioctls or only inptr's as arguments and unrelated devices merged into the same resource type. I think ioctl argument types are critical, w/o them the fuzzer will just pass random numbers which will fail very early. The fuzzer can extract ioctl command constants using comparison arguments interception already, but it does not know arguments to pass.
In fact, this issue appears after the recent code which try to use some hash to avoid repeat name in syscall descriptions. I am working on it, and I will definitely fix this. But I guess I need some time.
Ah, I see. One common approach for such issues is checking in auto-generated results after any changes, then diffs highlight any unintentional changes. We use this for kernel configs, descriptions warnings, etc.
Some update based on your comments:
- Fix the bugs of missing ioctl and type info in this version.
- Also add comments about whether there are ioctl handler in syscall handler structure
- Generate "ptr[in, array[int8]]" instead of "intptr"
- Only add some comments instead of "open()" if the SyzDescribe can not generate correct device file name
- Differentiate in/out based on "copy_from_user" and "copy_to_user"
I see there are lots of improvements! Nice!
I've tried one file and it does not compile:
syz_describe_drivers_block_mtip32xx_mtip32xx_c_amd64.txt:19:1: resource fd_11113595003055145101_fd can't be created (never mentioned as a syscall return value or output argument/field)
syz_describe_drivers_block_mtip32xx_mtip32xx_c_amd64.txt:27:98: array can't be syscall argument
syz_describe_drivers_block_mtip32xx_mtip32xx_c_amd64.txt:28:98: array can't be syscall argument
syz_describe_drivers_block_mtip32xx_mtip32xx_c_amd64.txt:29:98: array can't be syscall argument
syz_describe_drivers_block_mtip32xx_mtip32xx_c_amd64.txt:31:98: array can't be syscall argument
make a lot of minor format updates, now the results of allyesconfig can pass the make generate
in syzkaller, for example:
- annotate unused resources
- add
opt
key word for recursive structure - add a fake
open("incorrect_device_file_name")
if syzdescribe does not find correct device file name - use
int64
instead ofint8
for arguments - fix some bugs for hash value to avoid some repeat declarations