feat: report benchmarks of each tool
For each tool, benchmark scripts should report
- full execution time, i.e. including tool startup time
- execution time of just the transaction simulation (not sure if this is possible for each tool)
- gas used
- some simple state verification, e.g. check logs or traces, or perhaps we just check this one manually for now?
I think we need to determine which gasUsed is the correct gasUsed, as there really is only one correct answer, so benchmarks that get it wrong should fail. Anything else just means that the tool is doing it wrong and the results can't be trusted anyway.
I'd love to get all forking maintainers on a call together to share our knowledge with eachother (cc @micaiahreid)
eh - i think there is a specific configuration that should be standardized but what is actually useful as test output to the user maybe be a difference of opinion (and thus may differ between tools). Some tools like dapptools strip out things like creation of the test contract and base fee and calldata cost to give the user a better picture of execution costs.
@brockelmore yeah, it may make sense to ensure state changes are as expected; not sure what would look like for each tool though.