libmspack
libmspack copied to clipboard
Added fuzzer and oss-fuzz build-script
I have worked on setting up continuous fuzzing for libmspack, and this PR adds a fuzzer as well as the build script for oss-fuzz.
For those unfamiliar with fuzzing: Fuzzing is a way of testing software applications, whereby pseudo-random data is passed to a target application with the purpose of finding bugs and vulnerabilities. The fuzzer in this PR is implemented by way of libfuzzer. OSS-fuzz is a project by Google that offers open source projects to integrate and have their fuzzers run continuously free of charge through regular, scheduled fuzz runs. While it is free, it is expected that founds bugs are fixed, so that the resources spent on fuzzing libmspack are put to good use.
The fuzzer creates a .cab
file with the pseudo-random data, opens it and extracts it.
I will shortly be setting up the integration files on the OSS-fuzz side, and to complete the integration, the files in this PR need to be merged, and a maintainers email address needs to be added over at OSS-fuzz for bug reports.
the integration files on the OSS-fuzz side have been added here: https://github.com/google/oss-fuzz/pull/5085
Would be nice to get this merged
I'd be generally OK with merging this, but I have some concerns.
Could you add some kind of documentation, e.g. a README in this directory mentioning the OSS-fuzz project and its homepage? Mentioning it only in this PR will not be obvious to someone reading the code later.
Does this need to be written in C++? I don't mind if the program is built with a C++ compiler, but I'd rather not introduce any actual C++ in the codebase. I don't see any use of the <iostream>
that was included, and I'm not sure what std::remove
even does do to a stack-allocated char array in a C function.
I'd suggest taking a look at examples/cabd_memory.c
, because it seems to fit your use case: you have data in memory (fuzzer input) and you want to process it. It would not be hard to tweak this example to ignore all writes from libmspack, rather than record them to memory. This would let you get the most fuzzing done, as I found for myself that file I/O is very expensive, even when writing to /tmp
. Reading/writing from memory is orders of magnitude faster.
Also, I don't think this works as-is; You have the line int err = cabd->extract(cabd, cab->files, "/tmp");
which is to extract one file (the first one in the list that is cab->files
), and write it to the filename... /tmp
? But that's a directory, and libmspack will likely fail because it has been told to write a file there, and will get EISDIR error from fopen() in its default mspack_system implemenation. I'm not sure this code even gets to the point where it tries to extract any of the fuzzy data, so far all that is being tested is the cabd->open()
header parsing.
I've made all the changes I recommended above, you should find the fuzzer runs a lot faster now, and I'd be OK with merging this.
Would you be able to continue with the other side and get libmspack integrated into oss-fuzz?
I've made all the changes I recommended above, you should find the fuzzer runs a lot faster now, and I'd be OK with merging this.
Would you be able to continue with the other side and get libmspack integrated into oss-fuzz?
Thanks! Yes, sure. Which email can we use for bug reports?
Another thing is that this code is only checking CHM extraction. How would it be possible to register fuzzers also for:
- CAB search and extraction
- OAB extraction
- SZDD extraction
- KWAJ extraction
These shoulld all have their own seed/example files, as they're all different file formats.
In terms of code coverage, there should also be a fuzzer based on test/chmd_order.c
, which exercises the chmd->fast_find() code to would find bugs in parsing PMGI/PMGL headers, and extracts files out of order to find bugs in random-access decompression.