sys/linux: enhance usb driver fuzzing
TL;DR: syzkaller needs to be tuned in a few ways to expand coverage of select usb drivers.
There are a few usb drivers that either have not been properly covered from the start or have lost coverage over time (judging from 8-month long stats derived from syzbot - I used it to identify flaws listed below). I've done a quick, incomplete audit to determine the possible reasons and their solutions. Some issues are already mentioned in https://github.com/google/syzkaller/issues/533.
1. Failed usb interface/endpoint checks (or other settings like USB speed) Examples: rtl8150 - lost coverage; ironically, syzkaller can't pass a check made by me... au0828 - lost coverage; mostly breaks due to incorrectly set USB speed. +others like radio-shark, radio-shark2, redrat3 etc.
Q: Ideally, we need to tweak generic syz* calls that deal with connecting to increase chances of correct ennumeration (better mutation and whatnot). But in the meantime, should we describe syz_usb_connect$ variants (plus others), similar to ath9k but without modified pseudo-calls, to gain coverage?
2. Failed due to a number of early CONTROL requests in both directions during probe. Examples: cx231xx - probe can't finish; syzkaller tries to react to READ requests but doesn't seem to succeed fully. lan78xx - syzbot shows failure at ep check time; however, the main issue is during probe() there are tens of ctrl requests, in and out. Having experimented a bit, I've come to a conclusion that the sheer number of requests is so high that fuzzer may not want to generate essentially the same syscalls in a row, thus stoping probe() half way. It will generate a program with ~15 syz_usb_control_io$, when 30 is necessary. I've tried manually crafting a program to push it further but read requests are tricky. And that's on top of connect$ and control_io$ call variants tailored for lan78xx. +some others.
Q: Describing syscall variants$ for drivers with such specific issues should help somewhat. Is it a good idea? As for lan78xx, I wonder if a single pseudo-call that loops for ~30 secs and sends/receives requests until probe is finished is in order?
3. r8152 device needs to handle a READ request early. See usb_control_msg here. It's similar to ath9k case, just a different direction. In my view, this necessitates describing a separate syz_usb_connect pseudo-call for r8152 as well as changing syz_usb_connect_impl() call to take a different lookup_connect_response_in() (not _out) function as an argument.
Q: Is this solution sound?
4. Missing firmware. Examples: s2255 - again, usb ep checks fail, but firmware also can't be found. go7007 - same case.
I don't remember why firmware may not be found but I imagine the fault is either with missing config options or with missing packages from the image. This can be easily remedied.
Provided my suggestions are deemed sensible, I'll start implementing them. Thank you for your time.
Cc @xairy
Hi @fellair Thanks for looking into this!
But in the meantime, should we describe syz_usb_connect$ variants (plus others), similar to ath9k but without modified pseudo-calls, to gain coverage?
Just adding specialized versions in descriptions looks good to me. Adding lots of special C code would be more questionable. But only descriptions seems fine.
Q: Describing syscall variants$ for drivers with such specific issues should help somewhat. Is it a good idea?
Again, some amount of descriptions looks reasonable. Perhaps you could start with 1 driver only, so that we can assess better how exactly it will look like.
Also, have you considered adding new seeds in sys/linux/test/ (sys/linux/test/vusb*)?
As for lan78xx, I wonder if a single pseudo-call that loops for ~30 secs and sends/receives requests until probe is finished is in order?
I am out of context now to answer this. Not sure if @xairy still have any context on this. Do you see other options? Is it better to create a background thread in e.g. syz_usb_connect? initialize_vhci creates such a thread to answer kernel requests.
Q: Is this solution sound?
Also don't have context now.
- Missing firmware.
This sounds good. Enabling relevant configs is always good. For packages we can check if they are provided by buildroot, or if there are other ways to install them in the image.
Btw, IIRC we enable USB configs based on distro configs in dashboard/config/linux/distros/* (IIRC we enable only the ones used by real distros). Perhaps adding more fresh distro configs will be useful to enable more drivers as well.
Awesome analysis @fellair, thank you!
With regards to the drivers that fail various USB descriptor checks (speed, endpoints, etc.) — adding new syz_usb_connect$ variants is the way to go. These should fit just fine within the existing syzlang framework for describing USB devices.
One note here is that for some drivers it makes sense to add two syz_usb_connect$ variants: one that would sometimes generate proper USB descriptors to pass the driver's probing but would also sometimes fail to explore various error-checking paths; and another one that would pass the probing most of them time to allow syzkaller fuzzing the post-probing communication. But adding the first variant likely only makes sense if the probing code allows various permutations of interfaces/endpoints, which is usually only the case for class-specific drivers (e.g. UVC) and not device-specific drivers.
As for the atk9k-like cases like lan78xx and r8152 that have an unusual probing process with extra control requests: I think the best way would be to extend the existing framework to allow describing responses to these requests.
We already have a way to describe the pre-SET_CONFIGURATION requests as fields within vusb_connect_descriptors (none of the class/driver-specific descriptions use this feature at the moment though). This does not allow specifying the order of the responses if there are multiple requests of the same type, but I don't know whether we need this at all.
For the requests that must be always handled immediately after SET_CONFIGURATION, we can rework and extend syz_usb_connect_ath9k. We can rebrand it as something like syz_usb_connect_scripted that accepts another argument that specifies in syzlang the responses to post-SET_CONFIGURATION control requests and which one them is the last one (i.e., the condition for exiting the pseudo-syscall). And then we can port the hardcoded responses from the C implementation of syz_usb_connect_ath9k into syzlang and extend the functionality based on the specific new cases we encounter.
Overall, as @dvyukov suggested, if you want to work on improving USB fuzzing, you can start with adding USB descriptions for one of the simpler drivers that you analysed.
Adding new sys/linux/test/ (sys/linux/test/vusb*) seeds is an alternative approach to all this, but I think extending the syzlang descriptions and the pseudo-syscall implementations is a better approach. However, I do encourage you to add seeds based on the new USB descriptions that you add. This would allow testing their correctness and catching potential descriptions breaking changes in the future.
Thank you all for your insights!
With regards to the drivers that fail various USB descriptor checks (speed, endpoints, etc.) — adding new
syz_usb_connect$variants is the way to go. These should fit just fine within the existing syzlang framework for describing USB devices.One note here is that for some drivers it makes sense to add two
syz_usb_connect$variants: one that would sometimes generate proper USB descriptors to pass the driver's probing but would also sometimes fail to explore various error-checking paths; and another one that would pass the probing most of them time to allow syzkaller fuzzing the post-probing communication. But adding the first variant likely only makes sense if the probing code allows various permutations of interfaces/endpoints, which is usually only the case for class-specific drivers (e.g. UVC) and not device-specific drivers.
Adding new
sys/linux/test/(sys/linux/test/vusb*) seeds is an alternative approach to all this, but I think extending the syzlang descriptions and the pseudo-syscall implementations is a better approach. However, I do encourage you to add seeds based on the new USB descriptions that you add. This would allow testing their correctness and catching potential descriptions breaking changes in the future.
I think I'll start small and tackle these first.
As for the
atk9k-like cases likelan78xxandr8152that have an unusual probing process with extra control requests: I think the best way would be to extend the existing framework to allow describing responses to these requests.We already have a way to describe the pre-
SET_CONFIGURATIONrequests as fields withinvusb_connect_descriptors(none of the class/driver-specific descriptions use this feature at the moment though). This does not allow specifying the order of the responses if there are multiple requests of the same type, but I don't know whether we need this at all.For the requests that must be always handled immediately after
SET_CONFIGURATION, we can rework and extendsyz_usb_connect_ath9k. We can rebrand it as something likesyz_usb_connect_scriptedthat accepts another argument that specifies in syzlang the responses to post-SET_CONFIGURATIONcontrol requests and which one them is the last one (i.e., the condition for exiting the pseudo-syscall). And then we can port the hardcoded responses from the C implementation ofsyz_usb_connect_ath9kinto syzlang and extend the functionality based on the specific new cases we encounter.
This is very interesting, I think it's worth exploring... I will get on that!
First 4 issues described at the top can be dealt with:
- simply using generic
syz_usb_connect$variants (problem 1.) - installing missing firmware (linux-firmware, ideally, both free and nonfree versions, but even just first is enough - problem 4.)
- using newly suggested
syz_usb_connect_scripted, see PR here: https://github.com/google/syzkaller/pull/6372 (problems 2-3.) Provided, of course, it is accepted.
So, in most cases mentioned here it is only a matter of adding new descriptions for usb drivers that do not have adequate coverage (when they really should).
However, some usb drivers actually use BULK and INT requests during driver probe instead of CTRL. A few examples of such behaviour:
- https://elixir.bootlin.com/linux/v6.16-rc2/source/drivers/media/usb/ttusb-dec/ttusb_dec.c#L347
- https://elixir.bootlin.com/linux/v6.16-rc2/source/drivers/net/can/usb/usb_8dev.c#L178
- https://elixir.bootlin.com/linux/v6.16-rc2/source/drivers/usb/storage/ene_ub6250.c#L502
That will be the next upgrade for syz_usb_connect_scripted.
However, some usb drivers actually use BULK and INT requests during driver probe instead of CTRL.
This happens after the SET_CONFIGURATION request, right? I think then handling these requests doesn't belong to syz_usb_connect/_scripted. Keeping this pseudo-syscall to only handle control requests seems cleaner, as it's already overloaded as it is (also see what I wrote below). (Actually, the main reason syz_usb_connect was implemented to handle multiple control requests in the first place was because the USB code re-requests the same descriptors multiple times (e.g. the configuration descriptor) and expects the responses to align.)
For handling non-control post-SET_CONFIGURATION requests for now, just define driver-specific syz_usb_ep_write/read and add corresponding seeds (or see what I wrote below).
This, however, brought me to thinking about a different pseudo-syscall design idea.
What if keep syz_usb_connect as is: i.e., exit on the SET_CONFIGURATION request. And then add another pseudo-syscall like syz_usb_finish_probe (or syz_usb_driver_probe?) that would handle all post-SET_CONFIGURATION requests (both control and non-control ones) to finish the probing procedure for a specific driver.
This seems to make sense: whatever happens before the SET_CONFIGURATION request is handled by the USB core code. And syz_usb_connect will specifically be targeted at that part. And everything sent during the driver probing is not a part of the USB core code, and thus a separate pseudo-syscall for this seems natural.
And this would still allow to both fuzz the post-SET_CONFIGURATION requests sent during the probing (syzkaller would generate syz_usb_connect + random syz_usb_control_io or syz_usb_ep_read/write) and ones sent after proper probing (syz_usb_connect + syz_usb_finish_probe + ...). But the syzlang descriptions and pseudo-syscall C implementations would look cleaner.
What do you think about this approach?
Thank you for your response!
And this would still allow to both fuzz the post-
SET_CONFIGURATIONrequests sent during the probing (syzkaller would generatesyz_usb_connect+ randomsyz_usb_control_ioorsyz_usb_ep_read/write) and ones sent after proper probing (syz_usb_connect+syz_usb_finish_probe+ ...). But the syzlang descriptions and pseudo-syscall C implementations would look cleaner.What do you think about this approach?
Got a little confused by the order here. Did you mean:
- during probe:
syz_usb_connect+syz_usb_finish_probe(SET_CONFIGURATIONand then deal with other CTRL-requests in a fixed sequence, until all the requests are dealt with and probe finally succeeds). - after probe: a bunch of
syz_usb_control_ioif necessary?
Just to reiterate, I see a need for using a pseudo-syscall like syz_usb_finish_probe or even syz_usb_connect_scripted OVER generic syz_usb_connect + syz_usb_control_io due to the fact that the latter option is not suitable for drivers that require more than ~10-20 requests. At some point syzkaller simply stops adding new syscalls on top of a program or seed, thus making probe fail. A great example of that behaviour is lan78xx driver. The reasons behind this are a bit obscure: maybe an early timeout or quirks of syzkaller's fuzzing strategy...
Otherwise, I agree that a switch to a combo of syz_usb_connect + syz_usb_finish_probe variants is slightly better than current version of syz_usb_connect_scripted. I'll start working on that.
Got a little confused by the order here. Did you mean:
- during probe:
syz_usb_connect+syz_usb_finish_probe(SET_CONFIGURATIONand then deal with other CTRL-requests in a fixed sequence, until all the requests are dealt with and probe finally succeeds).- after probe: a bunch of
syz_usb_control_ioif necessary?
I meant that also syz_usb_connect + random syz_usb_control_io might explore various paths during probing (similar to what syz_usb_connect + syz_usb_finish_probe might explore but possibly different).
Just to reiterate, I see a need for using a pseudo-syscall like
syz_usb_finish_probeor evensyz_usb_connect_scriptedOVER genericsyz_usb_connect+syz_usb_control_iodue to the fact that the latter option is not suitable for drivers that require more than ~10-20 requests.
Yeah, that's the main point of syz_usb_finish_probe.
Also, even if have a seed with syz_usb_connect + all required syz_usb_control_io to finish the probing, it's just less likely that syzkaller will add more syz_usb_control_io to the end of the program (assuming we still don't hit the syscall number limit) and not insert them into the middle and break the probing. With a seed with only syz_usb_connect + syz_usb_finish_probe, it's more likely that syzkaller will add more syz_usb_control_io to the end. But this is only relevant for drivers that expect more control requests after the probing is done.
Otherwise, I agree that a switch to a combo of
syz_usb_connect+syz_usb_finish_probevariants is slightly better than current version ofsyz_usb_connect_scripted. I'll start working on that.
Awesome, thanks!
While PR https://github.com/google/syzkaller/pull/6372 is under review, and a similar approach to tackle non-CTRL requests is being prepared, I've decided to try to tackle a couple of other areas of usb drivers fuzzing, so far not covered by default. I am significantly less familiar with the following examples but I feel like they can be worked with nonetheless.
1. USB device nodes.
After driver probe finishes, kernel can create devices (at very least, we are talking about net- and other devices). Net is a bit of a blackbox for me right now, but some misc drivers may create actual files to open, close, read and write to. By default such files are not accounted for (except for a few examples lile /dev/hiddev - which are described but could be improved).
I suggest I upgrade somewhat a check for syz_open_dev (linuxSyzOpenDevSupported) and include a way to describe devices that may be created after boot completes, just slighlty more organized than it is now. And either describe generic or specific variants of syz_open_dev for select drivers of interest. Same goes for read/write calls as well. Provided such calls make sense at all considering reliance on raw-gadget.
2. USB driver power management. Most modern usb drivers are capable of being suspended and woken up. By default, syzbot enables standard behaviour for autosuspend. Judging from both current overall coverage, as well as couple of experiments of my own, virtual usb devices do not sleep, even if there is no apparent direct interaction with it. Once again, it may be a quirk of hcds setup not to sleep, or the way raw-gadget works by default. If we could somehow play around with autosuspend delays and teach devices to sleep between direct calls to it via pseudo-calls - even if devices end up disconnecting completely - it may constitute a decent improvement of code coverage.
@xairy, do you by chance have any thoughts?
-
Yes, this makes sense. You can also look through the existing
dev_*.txtdescriptions - I suspect some of them can be easily adapted to handle USB-emulated devices as well (dev_block.txt?). -
Do you mean the extra coverage would come from the paths of handling suspend/resume? I guess fuzzing those parts might produce some results. But perhaps adding more/better descriptions for standard USB classes is a better time investment for now (at least in terms of code covered/bugs found). But if you specifically want to play around with the power management code - sure, go for it :)
Note though that AFAIR, suspend and resume are initiated by the host. The device can only request a remote wakeup (resume). Raw Gadget does have some handling of suspend/resume, but I never tested it properly. I also don't know if the Dummy HCD/UDC supports this.
I'll try to take a look at the PR soon, thanks!
- Yes, this makes sense. You can also look through the existing
dev_*.txtdescriptions - I suspect some of them can be easily adapted to handle USB-emulated devices as well (dev_block.txt?).
So, judging from my small experiments there are a few problems:
- Such devices exist only during short period of time while usb virtual device is present. Playing around with
timeoutorprog_timeoutforsyz_usb_connect/syz_usb_finish_probeto ensure that device is present long enough does not help much. - The way usb-related devices are set up, there is no syncronisation between syzkaller procs/threads (the way
syz_open_devexpects thread №3 to open/dev/usb/legousbtower3file, and only it, is wrong here). One can specify only/dev/usb/legousbtower0as it will be present more frequently than others is a slight improvement. - Obviously, driver-specific
open/read/writerely on (CTRL and not) URBs as well, which adds a degree of difficulty for the fuzzer to properly execute both enumeration and meaningful interaction with the device via file operations.
I don't wanna give up on this yet, there may be a way to make it work.
- Do you mean the extra coverage would come from the paths of handling suspend/resume? I guess fuzzing those parts might produce some results. But perhaps adding more/better descriptions for standard USB classes is a better time investment for now (at least in terms of code covered/bugs found). But if you specifically want to play around with the power management code - sure, go for it :)
Exactly right. No doubt USB classes should take priority but PM stuff seems interesting too. I'll try to address them both.
P.S. On the off chance you find free time, I would greatly appreciate if you would be willing to put down a list of usb-related things that could be improved in syzkaller, something like this: https://github.com/google/syzkaller/issues/533.
Note though that AFAIR, suspend and resume are initiated by the host. The device can only request a remote wakeup (resume). Raw Gadget does have some handling of suspend/resume, but I never tested it properly. I also don't know if the Dummy HCD/UDC supports this.
Yes, I was worried that maybe changes are needed to underlying mechanism. I'll keep looking into it.
I'll try to take a look at the PR soon, thanks!
Thanks a lot!
Small update on USB device nodes like /dev/legousbtower:
- USB device nodes. After driver probe finishes, kernel can create devices (at very least, we are talking about net- and other devices). Net is a bit of a blackbox for me right now, but some misc drivers may create actual files to open, close, read and write to. By default such files are not accounted for (except for a few examples lile /dev/hiddev - which are described but could be improved).
Technically, this is already achieved via sys/linux/dev_char_usb.txt. Just because there are a few usb drivers with zero coverage of its file_operations stems from the fact that they are not passing enumeration/probe checks.
So, we can safely drop and address other issues.
Edit:
- USB driver power management. Most modern usb drivers are capable of being suspended and woken up. By default, syzbot enables standard behaviour for autosuspend. Judging from both current overall coverage, as well as couple of experiments of my own, virtual usb devices do not sleep, even if there is no apparent direct interaction with it. Once again, it may be a quirk of hcds setup not to sleep, or the way raw-gadget works by default. If we could somehow play around with autosuspend delays and teach devices to sleep between direct calls to it via pseudo-calls - even if devices end up disconnecting completely - it may constitute a decent improvement of code coverage.
As for USB PM stuff, the reason that current coverage overwhelmingly does not show suspend/resume routines in usb drivers (except for drivers/usb/class/usblp printer driver for some reason) is explained by flaky autosuspend support (see here). By default, all usb devices will not suspend. But we can enable autosuspend in the course of fuzzing with syzkaller by explicit udev rules. Doing that (as well as lowering standard autosuspend delay to 1000ms instead of 2s or 5s) should help find some new code pathways during testing.
However, the reason that usb devices are not usually set up to suspend is that they have a problem waking up. Printer driver is a good example of that as it will NOT wake up properly, even if forced to handle I/O after suspend. I bring it up to present a danger of universally allowing devices to suspend and some of them not being able to wake up.