regex Add 3 structure-aware fuzzers

These are the fuzzers used to find CVE-2022-24713.

Effectively, it just marks all AST members as Arbitrary when the fuzzer feature is enabled, then generates ASTs which are then converted back to statements in regex. Then, the fuzzer parses the generated string and executes against an input.

Mar 08 '22 22:03 addisoncrump

I was using

fuzz_target!(|data: (&str, &str, [bool; 7])| {
    let (
        pattern,
        input,
        [case_insensitive, multi_line, dot_matches_new_line, swap_greed, ignore_whitespace, unicode, octal],
    ) = data;

    let r = regex::RegexBuilder::new(pattern)
        .case_insensitive(case_insensitive)
        .multi_line(multi_line)
        .dot_matches_new_line(dot_matches_new_line)
        .swap_greed(swap_greed)
        .ignore_whitespace(ignore_whitespace)
        .unicode(unicode)
        .octal(octal)
        .dfa_size_limit(1024)
        .size_limit(1024)
        .build();

    if let Ok(re) = r {
        re.is_match(input);
    }
});

Looks like the only option that shows up as an option in the RegexBuilder that doesn't show up in the flags is octal. Might be worth fuzzing that option as well?

Mar 14 '22 19:03 5225225

Thanks so much!

I am not sure if I am a fan of adding dependencies to regex-syntax, even optional ones. Because they now become part of the API of the crate and require careful management when it comes to release maintenance, particularly given that it create a public dependency on another crate.

I'd really strongly prefer that we don't introduce new dependencies to regex-syntax. Adding a dev-dependency would be quite all right though. Is there a way to structure the fuzzer to make that possible I wonder?

Mar 14 '22 19:03 BurntSushi

Looks like the only option that shows up as an option in the RegexBuilder that doesn't show up in the flags is octal. Might be worth fuzzing that option as well?

@5225225, I saw your PR earlier (it was actually what inspired me to write this one!) and would be happy to add some of the testing capabilities yours demonstrates.

I'd really strongly prefer that we don't introduce new dependencies to regex-syntax. Adding a dev-dependency would be quite all right though. Is there a way to structure the fuzzer to make that possible I wonder?

@BurntSushi, let me see what I can do. :slightly_smiling_face:

Mar 14 '22 19:03 addisoncrump

@BurntSushi seems that this option isn't really available: https://github.com/rust-fuzz/cargo-fuzz/issues/256

Unfortunately, it would be necessary to make it a feature. However, it is possible to test if it's being used in a fuzzer environment, so we can set up something like the following:

#[cfg(any(all(feature = "fuzzer", not(fuzzing)), all(feature = "arbitrary", not(fuzzing))))]
compile_error!("You cannot enable the fuzzer feature without fuzzing!");

This prevents people from pulling in the fuzzer features or arbitrary without fuzzing. Does this suffice?

Mar 14 '22 19:03 addisoncrump

@VTCAKAVSMoACE with this fuzzers how long it takes to find cve-2022-24713? I'm tested locally from your repo I can't trigger the crash. I have 50ish corpus and using regex dictionary

Jun 05 '22 00:06 CityOfLight77

@VTCAKAVSMoACE with this fuzzers how long it takes to find cve-2022-24713? I'm tested locally from your repo I can't trigger the crash. I have 50ish corpus and using regex dictionary

Sorry for the late response. You need to set a timeout of a second or two, and you do not want to use a corpus.

Jun 14 '22 15:06 addisoncrump

I've given this some thought, and I'm just not okay with adding a public dependency on arbitrary for regex-syntax. I could stomach it as a dev-dependency, but even one that's optional and difficult to enable is too risky for me. And doing it just to enable some convenient testing does not feel like the right trade off to me. The problem it being in the dependency chain is that it's still considered part of the main set of dependencies and I believe there are various things that can impact building regex-syntax based on things that arbitrary does. (Perhaps the setting of rust-version, IIRC.)

May 01 '23 21:05 BurntSushi

I'm going to close this out for largely similar reasons as stated here: https://github.com/rust-lang/regex/pull/959#issuecomment-1530361182

May 01 '23 21:05 BurntSushi

Hm, this seems a bit of a shame to me. Grammar fuzzing discovers hard-to-find bugs very fast in regex (e.g. reproducing the ReDoS in <1 minute); it seems a bit more than convenient testing. I'll think about alternative implementation strategies here since I think it's important, but I don't have much time to do this at the moment. I will try to nerd snipe some my friends into this :wink:

May 04 '23 08:05 addisoncrump

Yeah I agree it's a shame. And honestly, my suggested work-around sucks. It blows chunks. I don't like it and I don't want it, haha.

This really does feel like a flaw in the cargo fuzz tooling. Basically, whenever it needs some additional infrastructure for testing, it mandates that it be added as a non-dev dependency. That's just not good at all. It might be acceptable for some crates to do that, but this is really what dev-dependencies are supposed to be for. So I really think cargo fuzz needs to figure out a way to support them.

I'll see if I can open up an issue with them and see if I can get anywhere that way.

May 04 '23 12:05 BurntSushi

I'm also reconsidering the other work-around suggested, which was mirroring the full Ast type in the fuzzer targets, and making them impl Arbitrary and then defining a From impl. All of the Ast types have a public representation, so this is possible to do. It is painful. But the Ast types very rarely change. So it might be feasible. At least, I think I like this better than my idea of copying and patching the entire regex-syntax crate.

May 04 '23 12:05 BurntSushi

I've added a request here: https://github.com/rust-fuzz/cargo-fuzz/issues/338

May 04 '23 12:05 BurntSushi

I closed the cargo-fuzz issue as unimplementable, but repeating some of my suggestions from there:

For this specific case, I'd really just recommend adding an arbitrary feature the way a lot of crates have serde features: other people may also want to include your types in their fuzzers. I recognize that this is similar to what has been proposed here and has already been rejected.

The hack for getting this to work is to instead manually define fuzz targets under examples/ using the libfuzzer crate, and figure out the sanitizer flags cargo fuzz passes in and pass them yourself using RUSTFLAGS=foo cargo rustc --examples bar -- <more flags>. That's basically all cargo-fuzz does: it creates a crate that depends on libfuzzer, sets up boilerplate, and then builds with the appropriate flags.

May 04 '23 14:05 Manishearth

@Manishearth gotya, thanks. I think the RUSTFLAGS hack might not work, but I'm not sure. In particular, this project is hooked up to OSS-fuzz, and I'm not sure whether it's possible in that context to say, "do RUSTFLAGS=foo cargo rustc ... instead of cargo fuzz."

May 04 '23 14:05 BurntSushi

Regarding OSS-Fuzz: this is something that you would put into the config file. Quite possible :slightly_smiling_face:

May 04 '23 14:05 addisoncrump

Ah! There's a lot that I don't know apparently. Haha.

May 04 '23 14:05 BurntSushi

OK, so I'm going to move forward with just adding arbitrary as an optional dependency to regex-syntax. And it will be an advertised feature. Things that changed my mind:

There's just not really any good alternative given the cargo fuzz setup today.
This drastically speeds up fuzzing. And given that regex 1.9 is going to be a complete rewrite, getting it under a good fuzzer before a release sounds like a very responsible thing to do.
I reviewed arbitrary itself and it looks like a solid crate with good dependency hygiene itself.
I added it as an optional dependency to regex-syntax locally and played around with it. Everything still builds on Rust 1.60.
If this ends up not working out, I can always back out the change because I can do new breaking change releases of regex-syntax at a higher frequency than regex itself.

May 04 '23 15:05 BurntSushi

So, what do you need from me? :slightly_smiling_face: Rebase and test?

May 04 '23 15:05 addisoncrump

Ah right, nope, you'll all set. I'm rolling this into #978. (I have other fuzz changes in that PR too, in order to improve coverage across various things.)

May 04 '23 15:05 BurntSushi

Okay. I think potentially something worthwhile is to also perform differential testing between the rewrite and the old version. This should be fairly quick to set up and I'll create a separate repository for this since it doesn't seem appropriate here.

May 04 '23 15:05 addisoncrump

@addisoncrump Sounds great. What does it entail? (Or feel free to avoid indulging my curiosity and just show me whenever the new repo is ready. :-))

May 04 '23 15:05 BurntSushi

I should be done in like 5 minutes? I'll just push it here since this PR is encapsulated by other changes.

May 04 '23 15:05 addisoncrump

Aha, yup, this immediately finds some differences. Test with cargo fuzz run -s none diff_new_fuzzer and enjoy your failing testcases :wink:

May 04 '23 15:05 addisoncrump

Note that currently, it uses generated ASTs from the old implementation. If there are new features present in the rewrite, this won't test them, or it will test them wrong.

May 04 '23 15:05 addisoncrump

Yeah unfortunately not only are there new features, but there are bug fixes in match semantics too.

I'll take a look at some of the failing cases to confirm they fall in those buckets, but this might make differential testing in this way tricky. (I've done my own form of differential non-fuzz testing by ensuring that the old implementation passes the new test suite and that the new implementation passes the old test suite.) The new test suite is much bigger than the old one, and tests each individual internal engine much more thoroughly. It's not fuzz testing, but it's not nothing either. :)

May 04 '23 15:05 BurntSushi

And the differential fuzzer also just repro'd the old CVE on the new implementation with: (?:){250000}?... Seems to be in parse-time, again, too.

May 04 '23 16:05 addisoncrump

@addisoncrump Yeah I caught that one a few days ago. I might not have pushed my branch since then. But it's in the new NFA compiler, not parser. Happened for a different reason.

May 04 '23 16:05 BurntSushi

For testing for the time-based issues, I used the following command line: cargo fuzz run -s none diff_new_fuzzer -- -timeout=5 -ignore_timeouts=0

May 04 '23 16:05 addisoncrump

@addisoncrump How do you get the fuzzer to ignore certain failures? Whenenver I run the differential fuzzer, it just fails on [&&], which was disallowed before and is now allowed. I suppose I can modify the fuzzer exit silently if neither regex compiles successfully instead of failing.

May 04 '23 16:05 BurntSushi

OK, here's my new differential program:

#![no_main]

use {
    libfuzzer_sys::fuzz_target, regex::Regex, regex_old::Regex as RegexOld,
    regex_syntax::ast::Ast,
};

#[derive(arbitrary::Arbitrary, Eq, PartialEq)]
struct FuzzData {
    ast: Ast,
    haystack: String,
}

impl std::fmt::Debug for FuzzData {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        let mut builder = f.debug_struct("FuzzData");
        builder.field("ast", &format!("{}", self.ast));
        builder.field("haystack", &self.haystack);
        builder.finish()
    }
}

fuzz_target!(|data: FuzzData| {
    let pattern = format!("{}", data.ast);
    let Ok(renew) = Regex::new(&pattern) else { return };
    let Ok(reold) = RegexOld::new(&pattern) else { return };
    assert_eq!(renew.is_match(&data.haystack), reold.is_match(&data.haystack));

    match (reold.captures(&data.haystack), renew.captures(&data.haystack)) {
        (None, None) => {}
        (Some(_), None) => panic!("old matched and new didn't"),
        (None, Some(_)) => panic!("new matched and old didn't"),
        (Some(oldcaps), Some(newcaps)) => {
            for (i, (oldcap, newcap)) in
                oldcaps.iter().zip(newcaps.iter()).enumerate()
            {
                match (oldcap, newcap) {
                    (None, None) => {}
                    (Some(_), None) => {
                        panic!("old caps has group {i} match, new doesn't");
                    }
                    (None, Some(_)) => {
                        panic!("new caps has group {i} match, old doesn't");
                    }
                    (Some(oldm), Some(newm)) => {
                        assert_eq!(oldm.as_str(), newm.as_str());
                    }
                }
            }
        }
    }
});

Which gets me past [&&]. But it quickly fails on ()*. The current behavior is buggy in that it says the capture group at index 1 does not match, but the new rewrite fixes that. So I switched to something a little more coarse:

#![no_main]

use {
    libfuzzer_sys::fuzz_target, regex::Regex, regex_old::Regex as RegexOld,
    regex_syntax::ast::Ast,
};

#[derive(arbitrary::Arbitrary, Eq, PartialEq)]
struct FuzzData {
    ast: Ast,
    haystack: String,
}

impl std::fmt::Debug for FuzzData {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        let mut builder = f.debug_struct("FuzzData");
        builder.field("ast", &format!("{}", self.ast));
        builder.field("haystack", &self.haystack);
        builder.finish()
    }
}

fuzz_target!(|data: FuzzData| {
    let pattern = format!("{}", data.ast);
    let Ok(renew) = Regex::new(&pattern) else { return };
    let Ok(reold) = RegexOld::new(&pattern) else { return };
    assert_eq!(renew.is_match(&data.haystack), reold.is_match(&data.haystack));

    match (reold.find(&data.haystack), renew.find(&data.haystack)) {
        (None, None) => {}
        (Some(_), None) => panic!("old matched and new didn't"),
        (None, Some(_)) => panic!("new matched and old didn't"),
        (Some(oldm), Some(newm)) => {
            assert_eq!(oldm.as_str(), newm.as_str());
        }
    }
});

And that at least seems to be running for a bit longer. There are still bug fixes even at this level though, so this could also find false positive failures.

I've been using the command you gave above, cargo fuzz run -s none diff_rewrite -- -timeout=5 -ignore_timeouts=0, to run the fuzzer. So far no timeouts.

May 04 '23 16:05 BurntSushi

regex regex copied to clipboard

Add 3 structure-aware fuzzers

regex
regex copied to clipboard