syzkaller icon indicating copy to clipboard operation
syzkaller copied to clipboard

sys/linux: syz-describe: auto generate syzlang

Open ZHYfeng opened this issue 2 years ago • 22 comments

Sorry that I wrongly close the PR #3129.

I have made some modifications based on your comments. Now just push one example of KVM for more comments. The modifications are below:

  • copyright headers and the kernel version at the beginning.
  • debug info about where the tool finds drivers, where the tool finds the device file names and syscall handler structures
  • a prefix syz_describe_ for syscalls
  • add meta arches["amd64"] based on target triple field in LLVM bitcode
  • change the file name to syz_describe_ + "module init function name" + "_arche"(e.g., _amd64) + .txt, I think the module init functions should rarely change, which should be ok for the file name.
  • fix bug that it should use syz_open_dev instead of openat

And I am trying to add some debug info for the structure definition.

In fact, I have trouble with the name of structure. For example, the KVM in svm_init(amd) and vmx_init(intel) share many same structures. If we want to keep them in different syscall description, e.g., syz_describe_svm_init_amd64.txt and syz_describe_vmx_init_amd64.txt, I have to add some other strings to those structures. Or we can only keep one record of them. Which one do you think is better?

ZHYfeng avatar May 18 '22 04:05 ZHYfeng

Codecov Report

Merging #3143 (e281a2f) into master (0042f2b) will decrease coverage by 0.0%. The diff coverage is n/a.

Impacted Files Coverage Δ
prog/mutation.go 88.1% <0.0%> (-1.2%) :arrow_down:
prog/any.go 84.8% <0.0%> (+0.8%) :arrow_up:
prog/size.go 92.4% <0.0%> (+1.9%) :arrow_up:

codecov[bot] avatar May 18 '22 04:05 codecov[bot]

For example, the KVM in svm_init(amd) and vmx_init(intel) share many same structures. If we want to keep them in different syscall description, e.g., syz_describe_svm_init_amd64.txt and syz_describe_vmx_init_amd64.txt, I have to add some other strings to those structures.

Are these different structures? Or the same struct reached from both svm_init and vmx_init? If it's the same struct then it's definitely better to keep just one of them. But I am not sure what file should it go into then. Does syz-describe do global analysis and know about the duplicate?

dvyukov avatar May 18 '22 13:05 dvyukov

Note that init functions can have duplicate names as well. Since they are static, they all can be called just init().

dvyukov avatar May 18 '22 13:05 dvyukov

Are these different structures? Or the same struct reached from both svm_init and vmx_init? If it's the same struct then it's definitely better to keep just one of them. But I am not sure what file should it go into then. Does syz-describe do global analysis and know about the duplicate?

They are the same structures reached from both svm_init and vmx_init. And I rewrote the related code to generate syzlang format syscall descriptions last week and now syz-describe could know the duplicate. So we can put it anywhere we want. I guess it is also reasonable to put them in different file. In the future, we can generate values or some other information for the arguments, and in that case, the same structures with different values would also be different. (I remember I saw this feature in syzkaller, but I cannot find this, please point out my mistakes if this does not exist)

Note that init functions can have duplicate names as well. Since they are static, they all can be called just init().

Currently, I have not found any module init functions with duplicate name. But I agree that it is possible to have duplicate name. How about using the path and init function name together? But for external modules not from the linux kernel (*.ko), I am not sure whether this name would be ok. But I guess we do not need to consider the name of external modules now.

ZHYfeng avatar May 19 '22 01:05 ZHYfeng

Currently, I have not found any module init functions with duplicate name. But I agree that it is possible to have duplicate name. How about using the path and init function name together?

I would use just file name. Or can a single file contain 2 init functions?

dvyukov avatar May 19 '22 05:05 dvyukov

the same structures with different values would also be different. (I remember I saw this feature in syzkaller, but I cannot find this, please point out my mistakes if this does not exist)

Yes, it exists, one can totally have 2+ versions of the same struct. One can just have 2 versions with completely arbitrary names. Say, if original struct is called foo, one can have 2 versions of it with different consts or anything else called bar and baz. However, the official recommended way to name them is foo$bar and foo$baz. This allows them to automatically understand that these both names actually refer to the same kernel struct foo. E.g. syz-check uses this to do static checking of descriptions: https://github.com/google/syzkaller/blob/master/tools/syz-check/check.go

Actually a single file may need several versions of the same struct as wel. For example, there is 2 ioctls, they accept the same struct, but the struct contains type field which needs to have constant value1 for the first ioctl and value2 for the second ioctl.

If it's the same struct then it's definitely better to keep just one of them.

Actually I take this back. Since this is auto-generated descriptions we can have multiple copies if it's better for other reasons.

dvyukov avatar May 19 '22 05:05 dvyukov

make two changes:

  1. use prefix + path of source code as the file name of generated syscall description
  2. use hash to avoid the repeat names of structures or syscalls, and the hash value is from the debug info about where the tool finds drivers, which is existing in the file.

I run make presubmit locally with no error. However, there is one in ci.

ZHYfeng avatar May 21 '22 04:05 ZHYfeng

I run make presubmit locally with no error. However, there is one in ci.

That's a OOM flake, I've restarted CI.

dvyukov avatar May 21 '22 09:05 dvyukov

We've discussed this internally and we think the first reasonable and uncontroversial step would be to commit automatic descriptions for interfaces for which we don't have manual descriptions.

Can you somehow share the full list of interfaces the tool can generate? Not sure what's the best way to do this. Either upload all files somewhere, or maybe just the list openat/syz_open_dev calls.

I think this may start uncovering some scalability issues in syzkaller, which we can address early on. Tighter integration will probably require open-sourcing the tool itself, since we will need to be able to reproduce the results and be able to make some adjustments.

Btw, the tool compiles the kernel and thus the generated descriptions depend on the kernel config used, right? What config did you use? If you used very small config (defconfig), then you may not get all interfaces. There is syzbot config which is reasonably beefy and also that's what syzbot tests: https://github.com/google/syzkaller/blob/master/dashboard/config/linux/upstream-apparmor-kasan.config However, it does not have some things enabled simply because we did not have descriptions for them. So if we can auto-generate we could enable more configs there.

dvyukov avatar May 21 '22 09:05 dvyukov

Btw, the tool compiles the kernel and thus the generated descriptions depend on the kernel config used, right?

Yes, the results depend on the kernel config used. Currently we separate the phase of build LLVM bitcode and the phase of analysis. So we can evaluate our tool with the bitcode including some modules, not the bitcode of the whole kernel. However, let me try to run the tool directly on the kernel with allyesconfig.

ZHYfeng avatar May 21 '22 21:05 ZHYfeng

I created a new repo that includes the results generated by syzdescribe (https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe). There are some performance issues for the whole kernel of allyesconfig and I am trying to solve them. So I uploaded some finished results for allyesconfig. Notice, there are still some FP & FN for device file names.

ZHYfeng avatar May 25 '22 16:05 ZHYfeng

Started looking. I assumed linux-v5.12-def (defconfig?) is a subset of linux-v5.12-allyes, but first 3 files I see in linux-v5.12-def are not present in linux-v5.12-allyes: syz_describe_arch_x86_kernel_cpuid_c_amd64.txt syz_describe_arch_x86_kernel_msr_c_amd64.txt syz_describe_block_bsg_c_amd64.txt

Also matching files show some strange diffs: https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-def/syz_describe_drivers_block_loop_c_amd64.txt https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_block_loop_c_amd64.txt

E.g. lots of unrelated open calls in linux-v5.12-def, and no struct definitions in linux-v5.12-allyes (ioctl args are just intptr's).

dvyukov avatar May 27 '22 13:05 dvyukov

Looking at: https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_xen_evtchn_c_amd64.txt all ioctl's have intptr arguments. I understand there may be some false negatives, but both ioctl code looks pretty idiomatic and does typed copy_from_user in each case: https://elixir.bootlin.com/linux/v5.12/source/drivers/xen/evtchn.c#L449 and const definitions include types: https://elixir.bootlin.com/linux/v5.12/source/include/uapi/xen/evtchn.h#L42 Do you know what happened there? This one may be reachable in virtual machines, but w/o types it may be not too useful.

Btw, I would suggest to generate ptr[in, array[int8]] instead of intptr when we can't extract the type b/c in most cases it's a pointer, while for intptr the fuzzer will pass not dereferencable values in most cases.

dvyukov avatar May 27 '22 13:05 dvyukov

I continued looking for some descriptions that are reachable in VMs and that we could commit.

I noticed lots don't have any ioctls, and just open's won't give lots of coverage: https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_block_floppy_c_amd64.txt#L31 https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_block_virtio_blk_c_amd64.txt#L23 https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_char_virtio_console_c_amd64.txt#L53 https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_block_umem_c_amd64.txt#L24

Also I noticed lots have some strange strings in open's, like "/dev/#", "/dev/%s", "/dev/": https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_block_floppy_c_amd64.txt#L19 If the tool tends to produce them, they probably can be just filtered out.

Here is an interesting name "/dev/inputu": https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_input_evdev_c_amd64.txt#L21 In the source code it's "input%lu": https://elixir.bootlin.com/linux/v5.12/source/drivers/input/input.c#L1924 The tools probably needs to handle %lu as %d.

Also I noticed some devices are placed into the root /dev dir, while they are actually located in sub-dirs. E.g. "/dev/event#" is actually "/dev/input/event#"; "/dev/mouse#" should be "/dev/input/mouse#" and "/dev/tun" should be "/dev/net/tun".

In some cases direction is mis-detected. E.g. ptr[in, struct_hidraw_devinfo_13102764037666289071]: https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_hid_hid-core_c_amd64.txt#L38 but it's ptr[out]: https://elixir.bootlin.com/linux/v5.12/source/include/uapi/linux/hidraw.h#L36 https://elixir.bootlin.com/linux/v5.12/source/drivers/hid/hidraw.c#L408

Also in this file "/dev/hiddev#" and "/dev/hidraw#" return the same resource type fd_4641101649349984127_fd https://github.com/ZHYfeng/Syscall_Description_by_SyzDescribe/blob/master/linux-v5.12-allyes/syz_describe_drivers_hid_usbhid_hid-core_c_amd64.txt While in the kernel sources they seem to be unrelated (different interfaces).

dvyukov avatar May 30 '22 07:05 dvyukov

Maybe you can suggest some suitable description file that is reachable in VMs and that is missing in our current descriptions?

dvyukov avatar May 30 '22 07:05 dvyukov

I am still working on the performance issues for allyesconfig and also trying to solve the issues you mentioned.

But for the suitable description files, there are some in below, although some of they also miss the ioctls for some unknown reasons in this version(they exist in previous version). Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_isdn_capi_capi_c_amd64.txt Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_isdn_mISDN_core_c_amd64.txt Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_vhost_net_c_amd64.txt Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_vhost_vsock_c_amd64.txt Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_misc_vmw_vmci_vmci_driver_c_amd64.txt Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_char_hpet_c_amd64.txt Syscall_Description_by_SyzDescribe/linux-v5.12-allyes/syz_describe_drivers_char_nvram_c_amd64.txt

ZHYfeng avatar May 31 '22 00:05 ZHYfeng

These seem to have the same issue: no ioctls or only inptr's as arguments and unrelated devices merged into the same resource type. I think ioctl argument types are critical, w/o them the fuzzer will just pass random numbers which will fail very early. The fuzzer can extract ioctl command constants using comparison arguments interception already, but it does not know arguments to pass.

dvyukov avatar May 31 '22 06:05 dvyukov

In fact, this issue appears after the recent code which try to use some hash to avoid repeat name in syscall descriptions. I am working on it, and I will definitely fix this. But I guess I need some time.

ZHYfeng avatar May 31 '22 19:05 ZHYfeng

Ah, I see. One common approach for such issues is checking in auto-generated results after any changes, then diffs highlight any unintentional changes. We use this for kernel configs, descriptions warnings, etc.

dvyukov avatar Jun 01 '22 06:06 dvyukov

Some update based on your comments:

  • Fix the bugs of missing ioctl and type info in this version.
  • Also add comments about whether there are ioctl handler in syscall handler structure
  • Generate "ptr[in, array[int8]]" instead of "intptr"
  • Only add some comments instead of "open()" if the SyzDescribe can not generate correct device file name
  • Differentiate in/out based on "copy_from_user" and "copy_to_user"

ZHYfeng avatar Jun 06 '22 00:06 ZHYfeng

I see there are lots of improvements! Nice!

I've tried one file and it does not compile:

syz_describe_drivers_block_mtip32xx_mtip32xx_c_amd64.txt:19:1: resource fd_11113595003055145101_fd can't be created (never mentioned as a syscall return value or output argument/field)
syz_describe_drivers_block_mtip32xx_mtip32xx_c_amd64.txt:27:98: array can't be syscall argument
syz_describe_drivers_block_mtip32xx_mtip32xx_c_amd64.txt:28:98: array can't be syscall argument
syz_describe_drivers_block_mtip32xx_mtip32xx_c_amd64.txt:29:98: array can't be syscall argument
syz_describe_drivers_block_mtip32xx_mtip32xx_c_amd64.txt:31:98: array can't be syscall argument

dvyukov avatar Jun 07 '22 11:06 dvyukov

make a lot of minor format updates, now the results of allyesconfig can pass the make generate in syzkaller, for example:

  • annotate unused resources
  • add opt key word for recursive structure
  • add a fake open("incorrect_device_file_name") if syzdescribe does not find correct device file name
  • use int64 instead of int8 for arguments
  • fix some bugs for hash value to avoid some repeat declarations

ZHYfeng avatar Jun 11 '22 21:06 ZHYfeng