bolero Detangling Driver and DriverMode

So, continuing the discussion started on https://github.com/camshaft/bolero/pull/108#discussion_r1083565909 ; let's chat about driver and driver mode here.

I guess my first question would be, is there any user-facing use case for something other than (DriverMode::Forced, Driver::Rng) (for proptesting) and (DriverMode::Direct, Driver::ByteSlice) (for fuzzers)? And is there a non-user-facing use case for anything other than (DriverMode::Forced, Driver::ByteSlice), that's I guess used for input shrinking when proptesting?

In particular, do you know why DirectRng is being exposed?

Also, writing down my current plan for changes of *Generator while I have it in mind: instead of fn(Driver) -> Generated, have it be fn(Driver, DriverMode, depth: usize) -> Generated. Use the depth to generate smaller collections when going deeper in Forced mode, and to refuse generating too deep data structures like arbitrary in Direct mode.

Jan 28 '23 21:01 Ekleog

Actually, the thoughts I’ve put in https://github.com/camshaft/bolero/issues/90#issuecomment-1408742489 made me notice a possible alternative solution to here, though the question of why DirectRng is exposed stays.

Maybe instead of having each generator have to check whether the DriverMode is Direct or Forced, we could instead have a function like the following on Driver?

trait Driver {
    // ...
    fn validate_generated<T>(&mut self, value: T, validator: impl FnMut(&T) -> bool, corrector: impl FnMut(&mut Self, T) -> Option<T>) -> Option<T>;
}

Implementation would look like this for current DriverMode::Direct drivers:

fn validate_generated<T>(&mut self, value: T, mut validator: impl FnMut(&T) -> bool, mut corrector: impl FnMut(&mut Self, T) -> Option<T>) -> Option<T> {
    if validator(&value) {
        Some(value)
    } else {
        None
    }
}

And the following for current DriverMode::Forced generators:

fn validate_generated<T>(&mut self, mut value: T, mut validator: impl FnMut(&T) -> bool, mut corrector: impl FnMut(&mut Self, T) -> Option<T>) -> Option<T> {
    loop {
        if validator(&value) {
            Some(value)
        } else {
            value = corrector(value)?;
        }
    }
}

And then we could get rid of DriverMode from the exposed API altogether, and instead keep it only for individual Driver implementations depending on whether we actually need it. I think this would also make sense for handling #92 ; as .preprocess_input would then basically be chaining a Driver::validate_generated with an all-None corrector and "just" reporting stats.

WDYT?

Jan 30 '23 14:01 Ekleog-NEAR

Thinking about it some more: while this would already be good, it may still make sense to keep something like DriverMode exposed, just not for Generators?

The reason is, I have code (in my personal web server fuzzing) that takes as a parameter a list of operation to execute, and executes them one after the other. The thing is, I don't want to be requiring the proptest generator to generate exact IDs, because they're UUIDs and so the chance of getting them right would mean proptest basically would be useless.

So I'm currently generating indices, indexing into a table of existing UUIDs, and resizing the index by the number of available UUIDs so that it always generates a valid UUID.

However, I'm thinking this is bad for fuzzing. So I'd like to differentiate between proptest and fuzzing from the contents of the test, because having all that code be in the generator itself would be quite a mess.

IOW, while I still think that DriverMode should be replaced by better abstractions (like the one I suggested just above) for everything regarding generators (because they shouldn't need to worry about that), it might still make sense to access it from the test runner itself, so exposed from bolero rather than bolero-generator (which would also probably help with stabilizing bolero-generator).

Either that, or expose an object akin to Driver, but that only exposes methods like validate_generated for use within the test itself… and this might actually be an even better idea? Something like:

enum RunnerMode { Fuzzer, Proptest, Prover }
 
pub struct Runner(RunnerMode);

impl Runner {
    pub fn make_valid<T>(&self, val: T, improver: impl FnOnce(T) -> T) -> T {
        match self.0 {
            RunnerMode::Fuzzer | RunnerMode::Prover => val,
            RunnerMode::Proptest => improver(val),
        }
    }
}

bolero::check!().with_type().for_each_with_runner(|r: Runner, (a, b): (usize, usize)| {
    let a = r.make_valid(a, |a| a % b);
    if a >= b { return; } // precondition that needs to be checked
    assert_eq!(b - a, b.saturating_sub(a));
});

(Names open to bikeshedding)

Runner would then also become a nice place to add any helper for test writers, should the need happen later. (eg. there could be a Runner::gen::<T>() function that'd allow generating different types depending on the values generated for the first things if the driver was not emptied by generating the initial value; though I'm not sure how much real-world usage that would see… I guess it might make sense for eg. mocking randomness so as to fuzz what would happen with the worst of chance?)

Would that make sense, or am I underestimating the power of generators?

Jan 30 '23 22:01 Ekleog

bolero bolero copied to clipboard

Detangling Driver and DriverMode

bolero
bolero copied to clipboard