regex
regex copied to clipboard
Add 3 structure-aware fuzzers
These are the fuzzers used to find CVE-2022-24713.
Effectively, it just marks all AST members as Arbitrary when the fuzzer feature is enabled, then generates ASTs which are then converted back to statements in regex. Then, the fuzzer parses the generated string and executes against an input.
I was using
fuzz_target!(|data: (&str, &str, [bool; 7])| {
let (
pattern,
input,
[case_insensitive, multi_line, dot_matches_new_line, swap_greed, ignore_whitespace, unicode, octal],
) = data;
let r = regex::RegexBuilder::new(pattern)
.case_insensitive(case_insensitive)
.multi_line(multi_line)
.dot_matches_new_line(dot_matches_new_line)
.swap_greed(swap_greed)
.ignore_whitespace(ignore_whitespace)
.unicode(unicode)
.octal(octal)
.dfa_size_limit(1024)
.size_limit(1024)
.build();
if let Ok(re) = r {
re.is_match(input);
}
});
Looks like the only option that shows up as an option in the RegexBuilder that doesn't show up in the flags is octal. Might be worth fuzzing that option as well?
Thanks so much!
I am not sure if I am a fan of adding dependencies to regex-syntax
, even optional ones. Because they now become part of the API of the crate and require careful management when it comes to release maintenance, particularly given that it create a public dependency on another crate.
I'd really strongly prefer that we don't introduce new dependencies to regex-syntax
. Adding a dev-dependency would be quite all right though. Is there a way to structure the fuzzer to make that possible I wonder?
Looks like the only option that shows up as an option in the RegexBuilder that doesn't show up in the flags is octal. Might be worth fuzzing that option as well?
@5225225, I saw your PR earlier (it was actually what inspired me to write this one!) and would be happy to add some of the testing capabilities yours demonstrates.
I'd really strongly prefer that we don't introduce new dependencies to
regex-syntax
. Adding a dev-dependency would be quite all right though. Is there a way to structure the fuzzer to make that possible I wonder?
@BurntSushi, let me see what I can do. :slightly_smiling_face:
@BurntSushi seems that this option isn't really available: https://github.com/rust-fuzz/cargo-fuzz/issues/256
Unfortunately, it would be necessary to make it a feature. However, it is possible to test if it's being used in a fuzzer environment, so we can set up something like the following:
#[cfg(any(all(feature = "fuzzer", not(fuzzing)), all(feature = "arbitrary", not(fuzzing))))]
compile_error!("You cannot enable the fuzzer feature without fuzzing!");
This prevents people from pulling in the fuzzer features or arbitrary without fuzzing. Does this suffice?
@VTCAKAVSMoACE with this fuzzers how long it takes to find cve-2022-24713? I'm tested locally from your repo I can't trigger the crash. I have 50ish corpus and using regex dictionary
@VTCAKAVSMoACE with this fuzzers how long it takes to find cve-2022-24713? I'm tested locally from your repo I can't trigger the crash. I have 50ish corpus and using regex dictionary
Sorry for the late response. You need to set a timeout of a second or two, and you do not want to use a corpus.
I've given this some thought, and I'm just not okay with adding a public dependency on arbitrary
for regex-syntax
. I could stomach it as a dev-dependency
, but even one that's optional and difficult to enable is too risky for me. And doing it just to enable some convenient testing does not feel like the right trade off to me. The problem it being in the dependency chain is that it's still considered part of the main set of dependencies and I believe there are various things that can impact building regex-syntax
based on things that arbitrary
does. (Perhaps the setting of rust-version
, IIRC.)
I'm going to close this out for largely similar reasons as stated here: https://github.com/rust-lang/regex/pull/959#issuecomment-1530361182
Hm, this seems a bit of a shame to me. Grammar fuzzing discovers hard-to-find bugs very fast in regex (e.g. reproducing the ReDoS in <1 minute); it seems a bit more than convenient testing. I'll think about alternative implementation strategies here since I think it's important, but I don't have much time to do this at the moment. I will try to nerd snipe some my friends into this :wink:
Yeah I agree it's a shame. And honestly, my suggested work-around sucks. It blows chunks. I don't like it and I don't want it, haha.
This really does feel like a flaw in the cargo fuzz
tooling. Basically, whenever it needs some additional infrastructure for testing, it mandates that it be added as a non-dev dependency. That's just not good at all. It might be acceptable for some crates to do that, but this is really what dev-dependencies
are supposed to be for. So I really think cargo fuzz
needs to figure out a way to support them.
I'll see if I can open up an issue with them and see if I can get anywhere that way.
I'm also reconsidering the other work-around suggested, which was mirroring the full Ast
type in the fuzzer targets, and making them impl Arbitrary
and then defining a From
impl. All of the Ast
types have a public representation, so this is possible to do. It is painful. But the Ast
types very rarely change. So it might be feasible. At least, I think I like this better than my idea of copying and patching the entire regex-syntax
crate.
I've added a request here: https://github.com/rust-fuzz/cargo-fuzz/issues/338
I closed the cargo-fuzz issue as unimplementable, but repeating some of my suggestions from there:
For this specific case, I'd really just recommend adding an arbitrary
feature the way a lot of crates have serde
features: other people may also want to include your types in their fuzzers. I recognize that this is similar to what has been proposed here and has already been rejected.
The hack for getting this to work is to instead manually define fuzz targets under examples/
using the libfuzzer
crate, and figure out the sanitizer flags cargo fuzz
passes in and pass them yourself using RUSTFLAGS=foo cargo rustc --examples bar -- <more flags>
. That's basically all cargo-fuzz
does: it creates a crate that depends on libfuzzer
, sets up boilerplate, and then builds with the appropriate flags.
@Manishearth gotya, thanks. I think the RUSTFLAGS
hack might not work, but I'm not sure. In particular, this project is hooked up to OSS-fuzz, and I'm not sure whether it's possible in that context to say, "do RUSTFLAGS=foo cargo rustc ...
instead of cargo fuzz
."
Regarding OSS-Fuzz: this is something that you would put into the config file. Quite possible :slightly_smiling_face:
Ah! There's a lot that I don't know apparently. Haha.
OK, so I'm going to move forward with just adding arbitrary
as an optional dependency to regex-syntax
. And it will be an advertised feature. Things that changed my mind:
- There's just not really any good alternative given the
cargo fuzz
setup today. - This drastically speeds up fuzzing. And given that
regex 1.9
is going to be a complete rewrite, getting it under a good fuzzer before a release sounds like a very responsible thing to do. - I reviewed
arbitrary
itself and it looks like a solid crate with good dependency hygiene itself. - I added it as an optional dependency to
regex-syntax
locally and played around with it. Everything still builds on Rust 1.60. - If this ends up not working out, I can always back out the change because I can do new breaking change releases of
regex-syntax
at a higher frequency thanregex
itself.
So, what do you need from me? :slightly_smiling_face: Rebase and test?
Ah right, nope, you'll all set. I'm rolling this into #978. (I have other fuzz changes in that PR too, in order to improve coverage across various things.)
Okay. I think potentially something worthwhile is to also perform differential testing between the rewrite and the old version. This should be fairly quick to set up and I'll create a separate repository for this since it doesn't seem appropriate here.
@addisoncrump Sounds great. What does it entail? (Or feel free to avoid indulging my curiosity and just show me whenever the new repo is ready. :-))
I should be done in like 5 minutes? I'll just push it here since this PR is encapsulated by other changes.
Aha, yup, this immediately finds some differences. Test with cargo fuzz run -s none diff_new_fuzzer
and enjoy your failing testcases :wink:
Note that currently, it uses generated ASTs from the old implementation. If there are new features present in the rewrite, this won't test them, or it will test them wrong.
Yeah unfortunately not only are there new features, but there are bug fixes in match semantics too.
I'll take a look at some of the failing cases to confirm they fall in those buckets, but this might make differential testing in this way tricky. (I've done my own form of differential non-fuzz testing by ensuring that the old implementation passes the new test suite and that the new implementation passes the old test suite.) The new test suite is much bigger than the old one, and tests each individual internal engine much more thoroughly. It's not fuzz testing, but it's not nothing either. :)
And the differential fuzzer also just repro'd the old CVE on the new implementation with: (?:){250000}?
... Seems to be in parse-time, again, too.
@addisoncrump Yeah I caught that one a few days ago. I might not have pushed my branch since then. But it's in the new NFA compiler, not parser. Happened for a different reason.
For testing for the time-based issues, I used the following command line:
cargo fuzz run -s none diff_new_fuzzer -- -timeout=5 -ignore_timeouts=0
@addisoncrump How do you get the fuzzer to ignore certain failures? Whenenver I run the differential fuzzer, it just fails on [&&]
, which was disallowed before and is now allowed. I suppose I can modify the fuzzer exit silently if neither regex compiles successfully instead of failing.
OK, here's my new differential program:
#![no_main]
use {
libfuzzer_sys::fuzz_target, regex::Regex, regex_old::Regex as RegexOld,
regex_syntax::ast::Ast,
};
#[derive(arbitrary::Arbitrary, Eq, PartialEq)]
struct FuzzData {
ast: Ast,
haystack: String,
}
impl std::fmt::Debug for FuzzData {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let mut builder = f.debug_struct("FuzzData");
builder.field("ast", &format!("{}", self.ast));
builder.field("haystack", &self.haystack);
builder.finish()
}
}
fuzz_target!(|data: FuzzData| {
let pattern = format!("{}", data.ast);
let Ok(renew) = Regex::new(&pattern) else { return };
let Ok(reold) = RegexOld::new(&pattern) else { return };
assert_eq!(renew.is_match(&data.haystack), reold.is_match(&data.haystack));
match (reold.captures(&data.haystack), renew.captures(&data.haystack)) {
(None, None) => {}
(Some(_), None) => panic!("old matched and new didn't"),
(None, Some(_)) => panic!("new matched and old didn't"),
(Some(oldcaps), Some(newcaps)) => {
for (i, (oldcap, newcap)) in
oldcaps.iter().zip(newcaps.iter()).enumerate()
{
match (oldcap, newcap) {
(None, None) => {}
(Some(_), None) => {
panic!("old caps has group {i} match, new doesn't");
}
(None, Some(_)) => {
panic!("new caps has group {i} match, old doesn't");
}
(Some(oldm), Some(newm)) => {
assert_eq!(oldm.as_str(), newm.as_str());
}
}
}
}
}
});
Which gets me past [&&]
. But it quickly fails on ()*
. The current behavior is buggy in that it says the capture group at index 1 does not match, but the new rewrite fixes that. So I switched to something a little more coarse:
#![no_main]
use {
libfuzzer_sys::fuzz_target, regex::Regex, regex_old::Regex as RegexOld,
regex_syntax::ast::Ast,
};
#[derive(arbitrary::Arbitrary, Eq, PartialEq)]
struct FuzzData {
ast: Ast,
haystack: String,
}
impl std::fmt::Debug for FuzzData {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let mut builder = f.debug_struct("FuzzData");
builder.field("ast", &format!("{}", self.ast));
builder.field("haystack", &self.haystack);
builder.finish()
}
}
fuzz_target!(|data: FuzzData| {
let pattern = format!("{}", data.ast);
let Ok(renew) = Regex::new(&pattern) else { return };
let Ok(reold) = RegexOld::new(&pattern) else { return };
assert_eq!(renew.is_match(&data.haystack), reold.is_match(&data.haystack));
match (reold.find(&data.haystack), renew.find(&data.haystack)) {
(None, None) => {}
(Some(_), None) => panic!("old matched and new didn't"),
(None, Some(_)) => panic!("new matched and old didn't"),
(Some(oldm), Some(newm)) => {
assert_eq!(oldm.as_str(), newm.as_str());
}
}
});
And that at least seems to be running for a bit longer. There are still bug fixes even at this level though, so this could also find false positive failures.
I've been using the command you gave above, cargo fuzz run -s none diff_rewrite -- -timeout=5 -ignore_timeouts=0
, to run the fuzzer. So far no timeouts.