zig icon indicating copy to clipboard operation
zig copied to clipboard

Add length range to FuzzInputOptions

Open AdamGoertz opened this issue 1 year ago • 7 comments

Closes #20816

Example test case, fuzzing f64 printing.

const std = @import("std");
const testing = std.testing;

test "fuzz f64 printing" {
    const input_bytes = testing.fuzzInput(.{
        .corpus = &.{std.mem.asBytes(&@as(f64, 32.0))},
        .len_range = .{ .min = 8, .max = 8 },
    });
    var buf: [1024]u8 = undefined;
    _ = try std.fmt.bufPrint(&buf, "{}", .{@as(f64, @bitCast(input_bytes[0..8].*))});
}

If an input corpus is not provided, the first run of the fuzzer is seeded with random bytes; the length of the bytes is random, but is within the min and max range specified by the len_range option.

Several invariants are enforced with assertions:

  • len_range.min <= len_range.max
  • The output of Fuzzer.next() is always between the min and max length (inclusive).
  • the length of corpus entries is between the min and max range

AdamGoertz avatar Aug 03 '24 03:08 AdamGoertz

I'll look at this in a couple days - I have a rather large body of work sitting in the fuzz branch right now.

andrewrk avatar Aug 03 '24 19:08 andrewrk

No problem, I figured. As of yesterday looks like the conflicts would be pretty minimal, so hopefully it'll stay easy.

AdamGoertz avatar Aug 04 '24 00:08 AdamGoertz

Looks like there were indeed a couple of conflicts.

andrewrk avatar Aug 07 '24 22:08 andrewrk

Conflicts resolved :+1:

AdamGoertz avatar Aug 08 '24 02:08 AdamGoertz

If the minimum length is greater than 0, the corpus must contain at least 1 entry, so that we can use it for the initial run/smoke test of the fuzzer.

No need, it can generate random bytes in this case.

andrewrk avatar Aug 08 '24 03:08 andrewrk

If the minimum length is greater than 0, the corpus must contain at least 1 entry, so that we can use it for the initial run/smoke test of the fuzzer.

No need, it can generate random bytes in this case.

I considered this. The concern I had was with ownership of the memory for holding the input. Since the length range is runtime known we need to allocate it dynamically. Under normal circumstances the fuzzer owns all of the returned memory, but on the initial run it's not clear how to de-allocate the slice returned from fuzzInput. A few options I considered:

  • Use a comptime-known length range: This would force the FuzzInputOptions parameter to be comptime known. Being able to load corpus entries from a file (without '@embedFile') would be nice, which this option would preclude. The length range could also be moved to a second parameter.

  • Reorganize the fuzzer code so it is still able to manage the returned input even when not linked to libfuzzer: This might be the best option, but I haven't explored how difficult it would be. My objective was to minimize changes to the current fuzzing architecture.

  • add a 'deinit' function for the fuzzer input: totally possible, just annoying because it would do nothing while actually fuzzing; its only responsibility would be cleaning up this one allocation for the smoke test.

Does one of these (or something else I haven't considered) seem like a better approach?

AdamGoertz avatar Aug 08 '24 11:08 AdamGoertz

Ok, a pretty easy solution by adding another global variable to track the bytes allocated for the smoke test. I think that should work.

AdamGoertz avatar Aug 09 '24 03:08 AdamGoertz

Sounds good! Once that lands I'll take another look 👍

AdamGoertz avatar Sep 11 '24 20:09 AdamGoertz

Looks like the updated version is now green 👍🏻

AdamGoertz avatar Sep 15 '24 03:09 AdamGoertz

Going to close this given that #23416 is merged. I think things have changed enough that it will need to be mostly re-implemented.

AdamGoertz avatar Sep 19 '25 19:09 AdamGoertz