arbitrary icon indicating copy to clipboard operation
arbitrary copied to clipboard

Reversed arbitrary

Open epilys opened this issue 5 years ago • 10 comments

Hello,

I've been toying with custom mutators in cargo-fuzz with libfuzzer and it seems to me to support changes to typed data instead of bytes, the reversed arbrirtary operation would be really helpful. Here's the interface I have implemented so far:

/// Define a custom fuzz mutator.
///
/// If `$bytes` exceeds `$max_size`, it will be silently truncated.
///
/// ## Example
/// ```no_run
/// #![no_main]
/// use libfuzzer_sys::{fuzz_target, fuzz_mutator, llvm_fuzzer_mutate};
///
/// fuzz_target!(|data: &[u8]| {
///     let _ = std::str::from_utf8(data);
/// });
///
/// fuzz_mutator!(|data: &mut [u8], max_size: usize| {
///     println!("custom mutator called with data len = {} and max_size = {}", data.len(), max_size);
///     /* call wrapper function of libfuzzer's default mutator */
///     llvm_fuzzer_mutate(data, max_size)
/// });
/// ``` 

The mutator with typed data would be the equivalent:

fuzz_mutator!(|data: &mut T, max_size: usize | {
   loop {
   /* perform changes on `data` */

   if data.arbitrary_size() <= max_size { /* where arbitrary_size is some method from the Trait to calculate size in bytes */
     break;
  }
}
 });

Does this sound like a reasonable approach to you?

epilys avatar May 16 '20 11:05 epilys

So the idea is that this would also involve adding something like a as_arbitrary_bytes method on the Arbitrary trait? And the fuzz_mutator! would construct the T: Arbitrary for you, let you mutate it, and then call as_arbitrary_bytes to give the bytes back to libfuzzer?

And we would want that x == T::arbitary_take_rest(x.as_arbitrary_bytes()) where x: T? (semi-aside: it might be hard to maintain this property for our arbitrary-length-getting functions)

This seems like a nice thing to have, but I haven't totally thought through how it might play out in terms of interaction with the Arbitrary trait, how nice we can keep the ux, and how well things interact and compose in practice.

However, a first step that I feel is safe to make without answering all those unknowns is to add a libfuzzer_sys::fuzz_mutator!(|data: &mut [u8], max_len: usize| { ... }) macro, that just supports [u8] and does not support T: Arbitrary.

fitzgen avatar May 18 '20 17:05 fitzgen

@Manishearth do you have thoughts on this, and how we might integrate it smoothly into the Arbitrary trait?

fitzgen avatar May 18 '20 17:05 fitzgen

I'm not really sure! I think it's possible, but it might be annoying

Manishearth avatar May 18 '20 18:05 Manishearth

This might also be useful also for providing a corpus when using structure-aware fuzzing.

zommiommy avatar Jul 04 '20 10:07 zommiommy

I was thinking this would be helpful for me too (I'm contemplating on using Arbitrary for something else but fuzzing, though the property of small change in input yielding small change in output sounds beneficial for my use case), as I would like a way to provide some starting inputs for the search. And generating the input bytes by hand isn't going to be exactly ergonomic 😇.

I was thinking if this would be better as a separate crate (eg. Unstructure or FromArbitrary), one that would be possible to derive separately (or not implement by hand if not needed). The derive would of course provide a "matching" implementation.

vorner avatar Nov 11 '20 09:11 vorner

This might also be useful also for providing a corpus when using structure-aware fuzzing.

This is my use case as well. I'd like to benefit from the wonderfully clean fuzz harnesses that Arbitrary enables without sacrificing the ability to use a seed corpus.

mykter avatar Feb 27 '21 15:02 mykter

I would use this for seeding purposes as well...

bitwave avatar Nov 22 '21 17:11 bitwave

I started implementing a dearbitrary function in https://github.com/bitwave/arbitrary/tree/revert-mode Try to create a PR in the next days...

bitwave avatar Dec 25 '21 11:12 bitwave

see #94

bitwave avatar Dec 25 '21 14:12 bitwave

for a use-case other than corpus seeding ... (which I really want also!!!)

... more intelligent tmin permutations! I imagine permuting over structured simplifications would yield much faster and likely better quality shrinking.

Hmm, maybe we can cheese this without dearbitrary by recompiling the target with a feature flag that would then use tmin logic inside the fuzz_target! macro. Something like:

unstructured bytes in (that you want to minimize) --> structured data (using Arbitrary) --> apply shrinking strategies over the code in the body of fuzz_target!

evanrichter avatar Aug 13 '22 03:08 evanrichter