OnDiskCorpus files be configurable to contain a human readable representation of the input
Most fuzzers will likely use some form of OnDiskCorpus (incl. InMemoryOnDiskCorpus, CachedOnDiskCorpus, etc.) for their solutions. To then figure out, what the problem actually was, one would need to know the content of the testcase/input that triggered the feedbacks. Currently, corpora storing them on disk store a bunch of generic information in the file associated with the testcase/input (such as runtime), but no representation of the input.
The only way to do add this without resorting to writing dummy-feedbacks that do nothing but add a new metadata with the input content, is by implementing the filename generating function on the input to extract the testcase from the corpus, and somehow stringify it:
fn generate_name(&self, id: Option<CorpusId>) -> String;
However, file names have a length restriction, so this isn't usable for inputs that can get somewhat long. Plus, for structured inputs, it would be much easier to have the entire structure nicely formatted in the file.
I don't fully understand: The OnDiskCorpus will contain the "content of the testcase/input that triggered the inputs"- that's what it's for, right?
That being said, currently the correct(tm) way to add metadata to a Testcase is via custom Feedbacks that do nothing like here: https://github.com/AFLplusplus/LibAFL/blob/e370e2f852b28aa0c4baedff426005429dbb6c08/libafl/src/feedbacks/stdio.rs#L107
Yes, the corpus will contain everything, of course. But it isn't written to disk, so when I kill the fuzzer, I lose everything but the metadata (found in the .metadata file). And that doesn't per default contain the input that triggered a crash (or whatever you're looking for). So I can't reproduce the crash.
Why is the _ OnDisk_Corpus not written to disk? What crash are you talking about? A crash in the fuzzer or a crash in the target? Crashes in the target are of course included in the corpus (if you have a CrashFeedback)? Sorry, I'm confused...
Ah, I see, seems like I missed something. If I understand correctly, the input content is serialised and written to disk in this method on Input, to the file associated with the crash without an extension or a leading dot:
/// Write this input to the file
fn to_file<P>(&self, path: P) -> Result<(), Error>
where
P: AsRef<Path>,
{
write_file_atomic(path, &postcard::to_allocvec(self)?)
}
When initialising the corpus, a format can be passed, and while this leaves the metadata nicely formatted, the input itself is still serialised and thus not human readable.
OnDiskCorpus::with_meta_format(
PathBuf::from("./crashes"),
OnDiskMetadataFormat::JsonPretty,
)
.unwrap(),
So I guess I'm asking for an option for human-readable serialisation of the input when written to disk.
I guess I could also just implement this for my input, so a global option may not be strictly necessary, but it would still be nice, just for consistency.
Related question: All input types in the repo (at least as far as I can see) generate their testcase names (fn generate_name(&self, id: Option<CorpusId>) -> String; on Input) the exact same way: hash their content (for collection types, namely Vecs, this is done manually for some reason) and take the first 16 bytes.
Should there not just be a blanket implementation that does this for any input that implements Hash (or where this is derived)?
For a human-readable serialization there is the DumpToDiskStage that goes through new inputs and serializes them with a provided closure.
Is this what you are looking for?
Yes, this kind of does what I would want it to do, but
- It also serialises corpus, not just solutions (and returns an error if passed something like
/dev/null) - I need to manually do the serialisation, as opposed to just telling it (like passing
OnDiskMetadataFormat::JsonPretty)
Depending on how large your corpus gets and the change-rate within it, the first point may annoying to a considerable downside. The second is not critical, just a bit of extra code, would just be easier without it :)
Plus I would expect this kind of functionality in the corpus, especially OnDiskCorpus, not in a stage — that's probably also why I haven't found this.
Feel free to fix the first point :) For the second point, we could have a number of serialiser functions in LibAFL, right?
Open for other suggestions of course.
you can use append_metadata on objective feedback to store any metadata for solution you want (see #2556)
Surprised to see that this is not get improved since my first time with libafl (0.6).
The culprit is that the metadata along with OnDiskCorpus is useless, i.e. it is never updated since being written to disk at the very first time. Any updates to metadata won't be written to disk once the testcase is added. See:
https://github.com/AFLplusplus/LibAFL/blob/main/libafl/src/corpus/inmemory_ondisk.rs#L213
This generally means metadata is read-only once written to disks while in many cases I would like to attach different states (not affecting execution etc) to an input. It might be reasonable as the metadata was designed to save information like executions etc but makes it super misleading and hard to extend.
Generally I could understand the motivation of @riesentoaster and I personally used a workaround similar to @Slava0135 : I created a dummy feedback to update a field of my custom Input type, like repr/outcome. This requires a custom input type, which is semantically correct (different states should be treated as different inputs) but not too intuitive. I think we should have another APIs to update metadata individually, which probably needs to modify Corpus trait. Another workaround I used previously is simple deleting the input, updating metadata and adding it again.
PRs welcome <3