foundry feat(forge): coverage guided fuzzing & time based campaigns for invariant mode

Motivation

Using the inspector added in https://github.com/paradigmxyz/revm-inspectors/pull/255, this PR adds a corpus which can be enabled by setting a folder path for the corpus_dir invariant config. A corpus size of min_corpus_size is targeted once all entries have been mutated min_corpus_mutations times with one caveat -- if the entry is high likely to produce new finds, greater than one-third (guesstimate), the entry will continue to be a candidate for mutation. Currently, there are 5 mutations: splice combines two sequences, interleave weaves two sequences together, prefix overwrites the beginning of a sequence, suffix overwrites the end of a sequence, and mutate args selects a subset of arguments from a call in the sequence to modify.

Closes https://github.com/foundry-rs/foundry/issues/8665 Closes https://github.com/foundry-rs/foundry/issues/990 - if timeout (in secs) configured, will loop until expires instead max runs

TODO

[x] Replay corpus at start up so the hitmap is initialized and not duplicating entries
[x] LRU cache for corpus that flushes to disk when a seed hasn't been selected for mutation in awhile. This will be random for now as we don't have any notion of seed scheduling/priority, so it could just be a fixed-size data structure that writes to disk when an entry is evicted without the LRU component.
[x] Add mutation to tx sequence that modifies the ABI values an reencodes it

These aren't strictly blocking but definitely near-term priorities to facilitate long-running fuzzing campaigns:

[ ] Headless fuzzing mode with structured logging that includes cumulative run info: num of edges seen, number of failures encountered, number of sequences tested, size of in-memory corpus
- [ ] https://github.com/foundry-rs/foundry/issues/9727
- [x] https://github.com/foundry-rs/foundry/issues/990
[ ] Support stateless fuzzing with mutations limited to ABI args
[ ] Figure out sharing sequences from across b/w each invariant's corpus as rn they are independent. I think calling all invariants in every thread is probably the right choice here, but they need to have the same config and target/excludes
- [ ] https://github.com/foundry-rs/foundry/issues/8898

Future work

[ ] Coverage evaluation suite. Run and collect edge coverage over time (probably at least 4 hours and repeated to account for random). Produce LCOV report and consider lines covered by Foundry w/o coverage, Echidna, Medusa but not w/ coverage.
[ ] Add decoded ABI signature to corpus
[ ] Seed corpus from unit test call sequences

Performance

Requires more investigation so jotting ideas I thought of

[ ] Make corpus lookup more efficient https://github.com/foundry-rs/foundry/pull/10190#discussion_r2114069107
[ ] Experiment with different edge sizes and maybe do something with shared memory
[ ] SIMD coverage map like with LibAFL

Solution

PR Checklist

[ ] Added Tests
[ ] Added Documentation
[ ] Breaking changes

Mar 28 '25 00:03 0xalpharush

https://github.com/foundry-rs/foundry/pull/10190/commits/b7f09d836c55e12cf6c721bf2bdb789ab86c5843 operated under the assumption that the libafl function updated the history map, but it doesn't. So we either need to revert back to the other way or iterate over the history map and set every index to the "classified" hitcount when there's new coverage

Apr 29 '25 13:04 0xalpharush

b7f09d8 operated under the assumption that the libafl function updated the history map, but it doesn't. So we either need to revert back to the other way or iterate over the history map and set every index to the "classified" hitcount when there's new coverage

I think reverting and using the other way should work here, do you see any big disadvantages using one or another?

Apr 30 '25 08:04 grandizzy

Mainly perf considerations... But it can wait and we avoid taking a dep on libafl

Apr 30 '25 16:04 0xalpharush

added time based invariant campaigns in https://github.com/foundry-rs/foundry/pull/10190/commits/08e501aae466372947858d02dc2c9850d12d32e4

Jun 02 '25 07:06 grandizzy

@DaniPopes could you pls check addresses changes and if good to merge? Thank you

Jun 12 '25 18:06 grandizzy

Have we benchmarked this?

Jun 16 '25 10:06 DaniPopes

Have we benchmarked this?

It's still TBD, wanted to have it out for security researchers to start using it and provide feedback. If you strongly feel we shouldn't until we get all benchmarks then can work on it before merging, lmk. Thank you!

Jun 16 '25 10:06 grandizzy

Well we're replacing the invariant runner by default with no fallback, I would at least check if we haven't had any regressions in performance and behavior

Jun 16 '25 11:06 DaniPopes

Well we're replacing the invariant runner by default with no fallback, I would at least check if we haven't had any regressions in performance and behavior

Fair enough, for behavior I feel like we do have many tests in CI to make sure is not changed, for performance I'll compare at min with the one used for v1.0.0 benchmarks https://github.com/devdacian/solidity-fuzzing-comparison https://github.com/grandizzy/fuzz-benchmarks Will look for others will update ticket with results.

Other tests suggested by @0xalpharush:

we can run some of these for a couple hours and hopefully that gives some signal https://github.com/morpho-org/morpho-blue/tree/main/test/forge/invariant https://github.com/bgd-labs/aave-v3-origin/tree/v3.3.0/tests/invariants
collect lcov and compare using https://github.com/capgelka/lcov-diff to check effectiveness

@0xkarmacoma I know you've been running some tests https://x.com/0xkarmacoma/status/1889771304787796198 if you still have the setup would you be willing to compare with this PR too? thanks, lmk!

Jun 16 '25 11:06 grandizzy

I thought this would not be the default behavior as a user must add some configs to their foundry.toml to enable it

Jun 16 '25 12:06 0xalpharush

I thought this would not be the default behavior as a user must add some configs to their foundry.toml to enable it

@0xalpharush ah, do you mean we should keep the old behavior if let's say no corpus dir is set? rn is not doing this but can be easily done

Jun 16 '25 12:06 grandizzy

@grandizzy I don't have time to run the comparison rn, but I was using https://github.com/ConsenSysDiligence/daedaluzz/blob/master/run-foundry.sh

Jun 16 '25 21:06 0xkarmacoma

Regression tests

https://github.com/grandizzy/fuzz-benchmarks

no regression in breaking invariants
SimpleDSChiefTest can be caught faster with coverage guided fuzzing (as it requires same sequence mutated to break it, e.g. it constantly breaks it in 21500 calls vs 450500 calls)
ConstantsBytes32Test is now finding the counterexample, whereas prev was marked as failing
CreateTest does not require special config anymore

https://github.com/devdacian/solidity-fuzzing-comparison

Test Case	cov guided	no cov guided	v1.2.3	v1.0.0
NaiveReceiverAdvancedFoundry	9.08ms	11.62ms	13.08ms	12.66ms
NaiveReceiverBasicFoundry	2.37s	686.77ms	675.42ms	674.38ms
UnstoppableBasicFoundry	3.85s	3.91s	1.55s	1.71s
ProposalCryticTesterToFoundry	3.79ms	16.47ms	14.87ms	11.00ms
VotingNftCryticToFoundry	13.43s	4.32s	4.32s	4.51s
TokenSaleBasicFoundry	19.29s	7.66s	7.61s	7.95s

https://github.com/ConsenSysDiligence/daedaluzz

30mins run each time for maze 0, 1, 2 / 10 mins run each time for maze 3 and 4 on 16 cores
values shows number and which invariants broken with this PR (coverage guided, no coverage guided), v1.2.3 and v1.0.0
running without coverage guided fuzzing finds the same scenarios as v1.2.3 and v1.0.0
running with coverage guided fuzzing finds less scenarios for maze 3 (expected as coverage guided fuzzing should perform better on longer runs, maze 3 was run only for 10 mins)

Maze	PR coverage guided	PR not guided	v1.2.3	v1.0.0
0	13 (invariants 16, 17, 18, 19, 27, 29, 30, 34, 35, 38, 42, 5, 9)	13 (invariants 16, 17, 18, 19, 27, 28, 29, 30, 34, 35, 42, 5, 9)	13 (invariants 16, 17, 18, 19, 27, 29, 30, 34, 35, 38, 42, 5, 9)	14 (invariants 16, 17, 18, 19, 27, 28, 29, 30, 34, 35, 38, 42, 5, 9)
1	14 (invariants 12, 16, 17, 25, 26, 31, 32, 33, 38, 39, 47, 48, 5, 7)	13 (invariants 12, 16, 17, 25, 26, 31, 32, 38, 39, 47, 48, 5, 7)	14 (invariants 12, 16, 17, 25, 26, 31, 33, 38, 39, 46, 47, 48, 5, 7)	13 (invariant 12, 16, 17, 25, 26, 31, 33, 38, 39, 47, 48, 5, 7)
2	15 (invariants 14, 2, 26, 27, 29, 31, 33, 38, 39, 4, 40, 42, 43, 44, 46)	15 (invariants 14, 2, 26, 27, 29, 31, 33, 38, 39, 4, 40, 42, 43, 44, 46)	16 (invariants 13, 14, 2, 26, 27, 29, 31, 33, 38, 39, 4, 40, 42, 43, 44, 46)	15 (invariants 14, 2, 26, 27, 29, 31, 33, 38, 39, 4, 40, 42, 43, 44, 46)
3	10 (invariants 12, 16, 25, 3, 33, 36, 40, 41, 8, 9)	13 (invariants 10, 11, 12, 16, 25, 3, 33, 36, 40, 41, 6, 8, 9)	12 (invariants 10, 12, 16, 25, 3, 33, 36, 40, 41, 6, 8, 9)	13 (invariants 1, 10, 12, 16, 25, 3, 33, 36, 40, 41, 6, 8, 9)
4	15 (invariants 10, 11, 17, 22, 27, 28, 32, 33, 36, 37, 41, 43, 5, 6)	15 (invariants 10, 11, 17, 22, 27, 28, 32, 33, 36, 37, 41, 43, 5, 6)	15 (invariants 10, 11, 17, 22, 27, 28, 32, 33, 36, 37, 41, 43, 5, 6)	14 (invariants 10, 11, 17, 22, 27, 28, 32, 33, 36, 37, 41, 43, 5)

Coverage

https://github.com/morpho-org/morpho-blue

collected coverage from 30 minutes time based campaign, with and without coverage guided: forge coverage --mt invariant --report lcov --show-progress
same coverage results (using lcov-diff), though new coverage corpus was still generated: this could indicate longer time based campaign needed in order to see differences
campaign resulted in more runs for not guided (less time spent in mutated corpus and writing to disk)
coverage guided config

[profile.default.invariant]
depth = 100
timeout = 1800
corpus_dir = "corpus/"
corpus_min_size = 100
fail_on_revert = true

not guided config

[profile.default.invariant]
depth = 100
timeout = 1800
fail_on_revert = true

Invariant	PR coverage guided	PR not guided
invariantBadDebt	runs: 16860, calls: 1686000	runs: 18050, calls: 1805000
invariantBorrowShares	runs: 16435, calls: 1643500	runs: 17152, calls: 1715200
invariantMorphoBalance	runs: 16730, calls: 1673000	runs: 17605, calls: 1760500
invariantSupplyShares	runs: 12827, calls: 1282700	runs: 13286, calls: 1328600
invariantTotalSupplyGeTotalBorrow	runs: 18312, calls: 1831200	runs: 19523, calls: 1952300
invariantHealthy	runs: 11349, calls: 1134900	runs: 12027, calls: 1202700

Jun 17 '25 11:06 grandizzy

@0xalpharush @DaniPopes please see above https://github.com/foundry-rs/foundry/pull/10190#issuecomment-2980012506 Couple of things noticed while performing tests:

No regression introduced, using defaults (coverage guided disabled) the invariant tests behave as in previous versions
No immediate gain seen on morpho blue but was a time based campaign of only 30mins and new coverage was still produced / written on disk. Need extensive testing of feature with different code bases
Coverage guided fuzzing is suitable for longer campaigns. Also on scenarios that needs a seq of calls in specific order (like DSChief bug) coverage guided fuzzing is more efficient (all runs produce counterexamples). Running with default / older version can sometimes miss producing a counterexample and needs longer runs to break the invariant.

Jun 18 '25 18:06 grandizzy

I think there's a lot to be improved, but this unblocks running the fuzzer with coverage for hours on end and restarting from scratch as the corpus is cumulative.

I would merge and then work on these so we could collect analytics:

Headless fuzzing mode with structured logging that includes cumulative run info: num of edges seen, number of failures encountered, number of sequences tested, size of in-memory corpus
https://github.com/foundry-rs/foundry/issues/9727

Jun 20 '25 15:06 0xalpharush

yep, agree, @DaniPopes good to send?

Jun 20 '25 15:06 grandizzy