aHash `compile-time-rng` on a platform with `getrandom`

As a feature request, it would be quite nice to be able to use compile-time-rng (or, better, some dummy-unsafe-no-rng cfg) when building binaries with hashbrown maps while fuzzing.

The https://github.com/rust-fuzz crates seem to mostly always set --cfg=fuzzing, so its possible to also check #[cfg(fuzzing)] without ever actually exposing a potentially-unsafe feature. Happy to work to implement if this would be acceptable.

Jun 11 '21 02:06 TheBlueMatt

the build.rs can easily detect a fuzzing flag. (As can the code.) What do you want the behavior to be when fuzzing? (I don't understand the motivation enough to infer this) Should the code be fully deterministic so that every hashmap uses the same keys and hashes the same way? (IE: Just turning off the random seeding, or should the pointer math, and counters be disabled also?)

Additionally, should it force the fallback algorithm?

Jun 13 '21 03:06 tkaitchuck

A straight constant hash value would likely be ideal.

The goal here ultimately is full determinism in the sense that if you run the same code in the same order the keys should be identical (even if you run the same code twice in a row in the same process/static context). Pointer math likely wouldn't work as, while fuzzers should be turning off ASLR/etc, you may run multiple test cases in the same memory space and get different pointers. Just using compile time RNG values would work fairly well, but would result in fuzzers finding duplicate test cases across runs where you may re-build and get different hash values. Ultimately a straight constant is probably the easiest to implement and the best for fuzzing.

Jun 13 '21 03:06 TheBlueMatt

Ok. So even counters should be disabled. Because while it would be identical if the process is re-run, if the same code is called twice in a row in a loop it will produce different results.

I can add a flag to do this, but it still won't make the process fully deterministic if there is a non-ahash hashmap used anywhere. Additionally, there is the possibility it introduces quadratic runtime into functions that otherwise take linear time.

What sort of variable/flag is best to avoid this being 'on' when not intended?

Jun 13 '21 21:06 tkaitchuck

I can add a flag to do this, but it still won't make the process fully deterministic if there is a non-ahash hashmap used anywhere.

Sure, for my use-case I can just swap everything out for hashbrown so in my case it's easy :)

Additionally, there is the possibility it introduces quadratic runtime into functions that otherwise take linear time.

Right, maybe a reason to not always enable it when fuzzing? I assume in most fuzzing cases there aren't enough hash map entries for it to be a concern, however.

What sort of variable/flag is best to avoid this being 'on' when not intended?

The canonical one in Rust is #[cfg(fuzzing)], which is enabled by default by the rust-fuzz projects when building. Given it's not a feature and only an explicit --cfg it's somewhat hard to accidentally turn on (ie has to be a compile flag, cannot be set via Cargo.toml). I'd generally leave it at that, but if there's concern about the behavior being surprising we could swap for some other cfg like ahash-not-random.

Jun 17 '21 00:06 TheBlueMatt

Yes, #[cfg(fuzzing)] is definitely better than a feature flag, because it can't creep in accidently. IE: Some random dependency specifies it, therefor it gets set on the whole binary accidently.

The only thing I don't like about fuzzing is the connection is not clear. Nothing is being specified on this package so it might be surprising that its behavior changed because fuzing is enabled. The second thing that is a minor negative, is the fuzzer won't actually be able to affect the iteration order of maps. (And some bugs could depend on this).

If you are in-control of the main() I could add a setter method for a supplier for the random sequence used by the RandomStates. Effectively refactor the current implementation to be defined via a trait, and allow alternative impls to be supplied via a static setter. (impls would need to be send + sync) Then you could build a tool where these random values are supplied by the fuzzer (possibly indirectly).

Jun 20 '21 02:06 tkaitchuck

Oops, I'm sorry this somehow fell off my stack.

The only thing I don't like about fuzzing is the connection is not clear. Nothing is being specified on this package so it might be surprising that its behavior changed because fuzing is enabled.

Yea, its true that its a bit strange, but aside from your next point it seems like what a fuzz user would almost certainly want, so its a little less annoying. Specifically, the fact that fuzz output will randomly change and fail any time you use a hashmap in rust is actually quite annoying, and requires a rather significant amount of manual effort to resolve.

The second thing that is a minor negative, is the fuzzer won't actually be able to affect the iteration order of maps. (And some bugs could depend on this).

Indeed, I'm not entirely sure what to do about that. I suppose hashbrown could also detect fuzzing and perturb the iteration order itself, but that seems excessive.

If you are in-control of the main() I could add a setter method for a supplier for the random sequence used by the RandomStates. Effectively refactor the current implementation to be defined via a trait, and allow alternative impls to be supplied via a static setter. (impls would need to be send + sync) Then you could build a tool where these random values are supplied by the fuzzer (possibly indirectly).

At least for my use-case that's more than fine, because I'm aware of the issue and work with fuzzers in Rust regularly. For those who are less accustomed to fuzzing, non-deterministic behavior can be somewhat surprising, and some fuzz tools light up large warnings about non-determinism which isn't clear the source of. I suppose could always do both - default to deterministic behavior with cfg(fuzzing) and also provide a static setter which can use fuzz input as random source, or at least allow allow for compile-time randomness.

Jul 16 '21 03:07 TheBlueMatt

I have sortof a prototype of this. (Though annoyingly it does not work on ARM)

The basic idea is that we could have a feature/flag to check which disables runtime randomess, and a main can call this function to supply randomness (presumably from the fuzzer).

There is still a bunch of work to do. In terms of interface I don't really like the fact that the fixed and dynamic randomness are on the same trait, but I am not really sure what would be best for an application.

Jul 16 '21 05:07 tkaitchuck

I have sortof a prototype of this. (Though annoyingly it does not work on ARM)

Cool! Thanks.

The basic idea is that we could have a feature/flag to check which disables runtime randomess, and a main can call this function to supply randomness (presumably from the fuzzer).

Right, makes sense. I'm not sure what the right answer is, but, on balance, an inconsistent fuzzer is probably better than a fuzzer that will miss a bug that depends on hashmap iteration order, so probably best to use something like cfg(ahash_no_random) or whatever. Of course useful to document in README as some things (I believe driller is one) will give you completely inscrutable errors as a result of randomness being loaded at runtime.

There is still a bunch of work to do. In terms of interface I don't really like the fact that the fixed and dynamic randomness are on the same trait, but I am not really sure what would be best for an application.

Hmm, right, I assume they're hashed together in one way or another? I guess there's not a ton of reason to a fuzzer should care about both, as long as it can get hashes to output anything from just the dynamic part maybe fixed can stay fixed and dynamic entropy can come from the interface that fuzz main's can override? This would also make it a tiny bit harder to foot-gun with the API.

Jul 22 '21 20:07 TheBlueMatt

This is now available in v0.8.0

Aug 10 '22 06:08 tkaitchuck

aHash aHash copied to clipboard

`compile-time-rng` on a platform with `getrandom`

aHash
aHash copied to clipboard