syzkaller
syzkaller copied to clipboard
all: extract fs images from reproducers
This is the first part of the implementation -- the PR includes the extracting code and the code to save such images to files when running a standalone syz-manager instance.
This PR somewhat restricts the freedom of syz_mount_image mutation (though it was excessive anyway) in order to simplify such an extraction.
Also, I intentionally used the io.Reader interface to later plug it into the asset storage functionality. One other reason for that was that such images can get pretty large and we don't really want to waste RAM on them.
Codecov Report
Merging #3384 (f85f195) into master (87840e0) will decrease coverage by
0.0%. The diff coverage is53.3%.
| Impacted Files | Coverage Δ | |
|---|---|---|
| pkg/osutil/osutil.go | 16.6% <0.0%> (-0.8%) |
:arrow_down: |
| prog/analysis.go | 70.3% <0.0%> (-6.5%) |
:arrow_down: |
| prog/target.go | 57.1% <ø> (ø) |
|
| syz-manager/manager.go | 0.0% <0.0%> (ø) |
|
| sys/linux/init_images.go | 65.7% <65.7%> (ø) |
|
| prog/test_util.go | 58.6% <71.4%> (+3.9%) |
:arrow_up: |
| prog/prog.go | 82.4% <100.0%> (+0.1%) |
:arrow_up: |
| sys/linux/init.go | 61.2% <100.0%> (+0.4%) |
:arrow_up: |
First overall impression is that this is over-tailored for images; dealing with mounted images in the prog package feels awkward, it's supposed to more abstract; lots of code is in the sys/linux, while ability to save images it's not really linux-specific; dealing with the syz_mount_image format on the prog-level looks overly-complex.
This again makes me think of something like gzipped-data. The other project also may benefit from it.
If we add a type along the lines of gzipped["image"], then won't need to analyze/verify/fixup this segments array and the code for saving these will be OS-independent. The code also won't be image-specific, we just extract and save all of the gzipped blobs in the program.
I am also thinking currently minimization of these segment arrays probably takes insane amount of time. While we probably don't need to minimize them at all (the current strategy for minimization won't work anyway). Switching to gzipped would solve it as well.
However, the main issue I see for addition of gzipped is that we will need C decompression code in executor. We will either need to find an algo that is possible implement in reasonable number of LOC in C; or find a C library with permissive license and squash the decompression code; or use (and require for repros) some preinstalled package. Brotli has C impl: https://github.com/google/brotli/tree/master/c/dec But even if we squash it, it will be several KLOC. We could, of course, pass decompressed data to executor, but I don't think it will work well (esp if we will need to pass this data over network).
First overall impression is that this is over-tailored for images; dealing with mounted images in the prog package feels awkward, it's supposed to more abstract; lots of code is in the sys/linux, while ability to save images it's not really linux-specific; dealing with the syz_mount_image format on the prog-level looks overly-complex.
If we are to store mounted images as artifacts, I think they deserve to be some explicitly extractable entity. And since syz_mount_image call is (currnetly) linux-specific, I didn't find a better place other than prog + sys/linux.
It was thought more of as a temporary improvement on the existing mechanism to enable the sharing of such images (the code that updates the asset storage to handle crash assets is also ready and just waits until the image-producing code is ready). We can put it aside though and consider the more generic approach.
This again makes me think of something like gzipped-data. The other project also may benefit from it. If we add a type along the lines of
gzipped["image"], then won't need to analyze/verify/fixup this segments array and the code for saving these will be OS-independent. The code also won't be image-specific, we just extract and save all of the gzipped blobs in the program. I am also thinking currently minimization of these segment arrays probably takes insane amount of time. While we probably don't need to minimize them at all (the current strategy for minimization won't work anyway). Switching to gzipped would solve it as well.However, the main issue I see for addition of gzipped is that we will need C decompression code in executor. We will either need to find an algo that is possible implement in reasonable number of LOC in C; or find a C library with permissive license and squash the decompression code; or use (and require for repros) some preinstalled package. Brotli has C impl: https://github.com/google/brotli/tree/master/c/dec But even if we squash it, it will be several KLOC. We could, of course, pass decompressed data to executor, but I don't think it will work well (esp if we will need to pass this data over network).
Would it be an option to just require zlib? IIRC we require -pthread anyway for our threaded reproducers, why not require one more widely used library.
I'd also worry about at least two more problems:
- How it will affect mutation -- it seems that right now such segments also let the fuzzer specifically focus on the metadata, rather than on useless zero bytes. If we fuzz the huge binary image, it will mutate the useless parts 99% of time. The size of the compressed image will also grow over time.
- We'd need to be careful about the RAM consumption. If we just plug in the compression/decompression code and let the existing mutation logic remain, it will require lots of extra memory -- it might even happen that we have to mutate data in all procs simultaneously. And all that will be happening on VMs with (potentially) very limited memory.
I think it would be good if all code in prog and manager (and dashboard) is not specific to "mounted images", but rather expressed in terms of generic "binary artifacts" that has type. Currently the only type will be "mounted image". But if we will ever add another type, we will only need to extract artifacts on the new type.
I think it would be good if all code in prog and manager (and dashboard) is not specific to "mounted images", but rather expressed in terms of generic "binary artifacts" that has type. Currently the only type will be "mounted image". But if we will ever add another type, we will only need to extract artifacts on the new type.
Done
PTAL
I ran a separate syz-manager instance from this PR, the extracted images seem to work fine (they cause the same errors as their repros). Also, I successfully mount images extracted from our sys/linux/test/syz_mount_image* files.
@dvyukov Need your approval once again.