foundry icon indicating copy to clipboard operation
foundry copied to clipboard

feat(forge): coverage guided fuzzing & time based campaigns for invariant mode

Open 0xalpharush opened this issue 9 months ago • 5 comments

Motivation

Using the inspector added in https://github.com/paradigmxyz/revm-inspectors/pull/255, this PR adds a corpus which can be enabled by setting a folder path for the corpus_dir invariant config. A corpus size of min_corpus_size is targeted once all entries have been mutated min_corpus_mutations times with one caveat -- if the entry is high likely to produce new finds, greater than one-third (guesstimate), the entry will continue to be a candidate for mutation. Currently, there are 5 mutations: splice combines two sequences, interleave weaves two sequences together, prefix overwrites the beginning of a sequence, suffix overwrites the end of a sequence, and mutate args selects a subset of arguments from a call in the sequence to modify.

Closes https://github.com/foundry-rs/foundry/issues/8665 Closes https://github.com/foundry-rs/foundry/issues/990 - if timeout (in secs) configured, will loop until expires instead max runs

TODO

  • [x] Replay corpus at start up so the hitmap is initialized and not duplicating entries
  • [x] LRU cache for corpus that flushes to disk when a seed hasn't been selected for mutation in awhile. This will be random for now as we don't have any notion of seed scheduling/priority, so it could just be a fixed-size data structure that writes to disk when an entry is evicted without the LRU component.
  • [x] Add mutation to tx sequence that modifies the ABI values an reencodes it

These aren't strictly blocking but definitely near-term priorities to facilitate long-running fuzzing campaigns:

  • [ ] Headless fuzzing mode with structured logging that includes cumulative run info: num of edges seen, number of failures encountered, number of sequences tested, size of in-memory corpus
    • [ ] https://github.com/foundry-rs/foundry/issues/9727
    • [x] https://github.com/foundry-rs/foundry/issues/990
  • [ ] Support stateless fuzzing with mutations limited to ABI args
  • [ ] Figure out sharing sequences from across b/w each invariant's corpus as rn they are independent. I think calling all invariants in every thread is probably the right choice here, but they need to have the same config and target/excludes
    • [ ] https://github.com/foundry-rs/foundry/issues/8898

Future work

  • [ ] Coverage evaluation suite. Run and collect edge coverage over time (probably at least 4 hours and repeated to account for random). Produce LCOV report and consider lines covered by Foundry w/o coverage, Echidna, Medusa but not w/ coverage.
  • [ ] Add decoded ABI signature to corpus
  • [ ] Seed corpus from unit test call sequences

Performance

Requires more investigation so jotting ideas I thought of

  • [ ] Make corpus lookup more efficient https://github.com/foundry-rs/foundry/pull/10190#discussion_r2114069107
  • [ ] Experiment with different edge sizes and maybe do something with shared memory
  • [ ] SIMD coverage map like with LibAFL

Solution

PR Checklist

  • [ ] Added Tests
  • [ ] Added Documentation
  • [ ] Breaking changes

0xalpharush avatar Mar 28 '25 00:03 0xalpharush

https://github.com/foundry-rs/foundry/pull/10190/commits/b7f09d836c55e12cf6c721bf2bdb789ab86c5843 operated under the assumption that the libafl function updated the history map, but it doesn't. So we either need to revert back to the other way or iterate over the history map and set every index to the "classified" hitcount when there's new coverage

0xalpharush avatar Apr 29 '25 13:04 0xalpharush

b7f09d8 operated under the assumption that the libafl function updated the history map, but it doesn't. So we either need to revert back to the other way or iterate over the history map and set every index to the "classified" hitcount when there's new coverage

I think reverting and using the other way should work here, do you see any big disadvantages using one or another?

grandizzy avatar Apr 30 '25 08:04 grandizzy

Mainly perf considerations... But it can wait and we avoid taking a dep on libafl

0xalpharush avatar Apr 30 '25 16:04 0xalpharush

added time based invariant campaigns in https://github.com/foundry-rs/foundry/pull/10190/commits/08e501aae466372947858d02dc2c9850d12d32e4

image

grandizzy avatar Jun 02 '25 07:06 grandizzy

@DaniPopes could you pls check addresses changes and if good to merge? Thank you

grandizzy avatar Jun 12 '25 18:06 grandizzy

Have we benchmarked this?

DaniPopes avatar Jun 16 '25 10:06 DaniPopes

Have we benchmarked this?

It's still TBD, wanted to have it out for security researchers to start using it and provide feedback. If you strongly feel we shouldn't until we get all benchmarks then can work on it before merging, lmk. Thank you!

grandizzy avatar Jun 16 '25 10:06 grandizzy

Well we're replacing the invariant runner by default with no fallback, I would at least check if we haven't had any regressions in performance and behavior

DaniPopes avatar Jun 16 '25 11:06 DaniPopes

Well we're replacing the invariant runner by default with no fallback, I would at least check if we haven't had any regressions in performance and behavior

Fair enough, for behavior I feel like we do have many tests in CI to make sure is not changed, for performance I'll compare at min with the one used for v1.0.0 benchmarks https://github.com/devdacian/solidity-fuzzing-comparison https://github.com/grandizzy/fuzz-benchmarks Will look for others will update ticket with results.

Other tests suggested by @0xalpharush:

  • we can run some of these for a couple hours and hopefully that gives some signal https://github.com/morpho-org/morpho-blue/tree/main/test/forge/invariant https://github.com/bgd-labs/aave-v3-origin/tree/v3.3.0/tests/invariants
  • collect lcov and compare using https://github.com/capgelka/lcov-diff to check effectiveness

@0xkarmacoma I know you've been running some tests https://x.com/0xkarmacoma/status/1889771304787796198 if you still have the setup would you be willing to compare with this PR too? thanks, lmk!

grandizzy avatar Jun 16 '25 11:06 grandizzy

I thought this would not be the default behavior as a user must add some configs to their foundry.toml to enable it

0xalpharush avatar Jun 16 '25 12:06 0xalpharush

I thought this would not be the default behavior as a user must add some configs to their foundry.toml to enable it

@0xalpharush ah, do you mean we should keep the old behavior if let's say no corpus dir is set? rn is not doing this but can be easily done

grandizzy avatar Jun 16 '25 12:06 grandizzy

@grandizzy I don't have time to run the comparison rn, but I was using https://github.com/ConsenSysDiligence/daedaluzz/blob/master/run-foundry.sh

0xkarmacoma avatar Jun 16 '25 21:06 0xkarmacoma

Regression tests

https://github.com/grandizzy/fuzz-benchmarks

  • no regression in breaking invariants
  • SimpleDSChiefTest can be caught faster with coverage guided fuzzing (as it requires same sequence mutated to break it, e.g. it constantly breaks it in 21500 calls vs 450500 calls)
  • ConstantsBytes32Test is now finding the counterexample, whereas prev was marked as failing
  • CreateTest does not require special config anymore

https://github.com/devdacian/solidity-fuzzing-comparison

Test Case cov guided no cov guided v1.2.3 v1.0.0
NaiveReceiverAdvancedFoundry 9.08ms 11.62ms 13.08ms 12.66ms
NaiveReceiverBasicFoundry 2.37s 686.77ms 675.42ms 674.38ms
UnstoppableBasicFoundry 3.85s 3.91s 1.55s 1.71s
ProposalCryticTesterToFoundry 3.79ms 16.47ms 14.87ms 11.00ms
VotingNftCryticToFoundry 13.43s 4.32s 4.32s 4.51s
TokenSaleBasicFoundry 19.29s 7.66s 7.61s 7.95s

https://github.com/ConsenSysDiligence/daedaluzz

  • 30mins run each time for maze 0, 1, 2 / 10 mins run each time for maze 3 and 4 on 16 cores
  • values shows number and which invariants broken with this PR (coverage guided, no coverage guided), v1.2.3 and v1.0.0
  • running without coverage guided fuzzing finds the same scenarios as v1.2.3 and v1.0.0
  • running with coverage guided fuzzing finds less scenarios for maze 3 (expected as coverage guided fuzzing should perform better on longer runs, maze 3 was run only for 10 mins)
Maze PR coverage guided PR not guided v1.2.3 v1.0.0
0 13 (invariants 16, 17, 18, 19, 27, 29, 30, 34, 35, 38, 42, 5, 9) 13 (invariants 16, 17, 18, 19, 27, 28, 29, 30, 34, 35, 42, 5, 9) 13 (invariants 16, 17, 18, 19, 27, 29, 30, 34, 35, 38, 42, 5, 9) 14 (invariants 16, 17, 18, 19, 27, 28, 29, 30, 34, 35, 38, 42, 5, 9)
1 14 (invariants 12, 16, 17, 25, 26, 31, 32, 33, 38, 39, 47, 48, 5, 7) 13 (invariants 12, 16, 17, 25, 26, 31, 32, 38, 39, 47, 48, 5, 7) 14 (invariants 12, 16, 17, 25, 26, 31, 33, 38, 39, 46, 47, 48, 5, 7) 13 (invariant 12, 16, 17, 25, 26, 31, 33, 38, 39, 47, 48, 5, 7)
2 15 (invariants 14, 2, 26, 27, 29, 31, 33, 38, 39, 4, 40, 42, 43, 44, 46) 15 (invariants 14, 2, 26, 27, 29, 31, 33, 38, 39, 4, 40, 42, 43, 44, 46) 16 (invariants 13, 14, 2, 26, 27, 29, 31, 33, 38, 39, 4, 40, 42, 43, 44, 46) 15 (invariants 14, 2, 26, 27, 29, 31, 33, 38, 39, 4, 40, 42, 43, 44, 46)
3 10 (invariants 12, 16, 25, 3, 33, 36, 40, 41, 8, 9) 13 (invariants 10, 11, 12, 16, 25, 3, 33, 36, 40, 41, 6, 8, 9) 12 (invariants 10, 12, 16, 25, 3, 33, 36, 40, 41, 6, 8, 9) 13 (invariants 1, 10, 12, 16, 25, 3, 33, 36, 40, 41, 6, 8, 9)
4 15 (invariants 10, 11, 17, 22, 27, 28, 32, 33, 36, 37, 41, 43, 5, 6) 15 (invariants 10, 11, 17, 22, 27, 28, 32, 33, 36, 37, 41, 43, 5, 6) 15 (invariants 10, 11, 17, 22, 27, 28, 32, 33, 36, 37, 41, 43, 5, 6) 14 (invariants 10, 11, 17, 22, 27, 28, 32, 33, 36, 37, 41, 43, 5)

Coverage

https://github.com/morpho-org/morpho-blue

  • collected coverage from 30 minutes time based campaign, with and without coverage guided: forge coverage --mt invariant --report lcov --show-progress
  • same coverage results (using lcov-diff), though new coverage corpus was still generated: this could indicate longer time based campaign needed in order to see differences
  • campaign resulted in more runs for not guided (less time spent in mutated corpus and writing to disk)
  • coverage guided config
[profile.default.invariant]
depth = 100
timeout = 1800
corpus_dir = "corpus/"
corpus_min_size = 100
fail_on_revert = true
  • not guided config
[profile.default.invariant]
depth = 100
timeout = 1800
fail_on_revert = true
Invariant PR coverage guided PR not guided
invariantBadDebt runs: 16860, calls: 1686000 runs: 18050, calls: 1805000
invariantBorrowShares runs: 16435, calls: 1643500 runs: 17152, calls: 1715200
invariantMorphoBalance runs: 16730, calls: 1673000 runs: 17605, calls: 1760500
invariantSupplyShares runs: 12827, calls: 1282700 runs: 13286, calls: 1328600
invariantTotalSupplyGeTotalBorrow runs: 18312, calls: 1831200 runs: 19523, calls: 1952300
invariantHealthy runs: 11349, calls: 1134900 runs: 12027, calls: 1202700

grandizzy avatar Jun 17 '25 11:06 grandizzy

@0xalpharush @DaniPopes please see above https://github.com/foundry-rs/foundry/pull/10190#issuecomment-2980012506 Couple of things noticed while performing tests:

  • No regression introduced, using defaults (coverage guided disabled) the invariant tests behave as in previous versions
  • No immediate gain seen on morpho blue but was a time based campaign of only 30mins and new coverage was still produced / written on disk. Need extensive testing of feature with different code bases
  • Coverage guided fuzzing is suitable for longer campaigns. Also on scenarios that needs a seq of calls in specific order (like DSChief bug) coverage guided fuzzing is more efficient (all runs produce counterexamples). Running with default / older version can sometimes miss producing a counterexample and needs longer runs to break the invariant.

grandizzy avatar Jun 18 '25 18:06 grandizzy

I think there's a lot to be improved, but this unblocks running the fuzzer with coverage for hours on end and restarting from scratch as the corpus is cumulative.

I would merge and then work on these so we could collect analytics:

  • Headless fuzzing mode with structured logging that includes cumulative run info: num of edges seen, number of failures encountered, number of sequences tested, size of in-memory corpus
  • https://github.com/foundry-rs/foundry/issues/9727

0xalpharush avatar Jun 20 '25 15:06 0xalpharush

yep, agree, @DaniPopes good to send?

grandizzy avatar Jun 20 '25 15:06 grandizzy