ref-fvm icon indicating copy to clipboard operation
ref-fvm copied to clipboard

EVM integration tests

Open aakoshh opened this issue 2 years ago • 4 comments

The PR is an effort to add integration tests for EVM opcodes through the ref-fvm, so the whole executor/callmanager/kernel stack is involved. The integration tests are expected to exercise more complex scenarios than the unit tests in the builtin-actors do, mixing different lifecycle events with various call types.

Initially I thought about using state machine tests, a.k.a fuzzing. The beginning of that is added in testing/integration/src/smt.rs, which expects to use arbitrary::Arbitrary and arbtest to generate commands and compare the state of the system under test to a reference model.

This effort was halted in favour of hand written test scenarios. To write scenarios, I thought we could use Gherkin syntax and the Cucumber test runner, so we the scenarios stay readable.

I started adding Solidity test contracts under testing/integration/tests/evm. The contracts are compiled to bytecode and a Rust facade is generated using the ethers tools. The output is under testing/integration/tests/evm/artifacts and testing/integration/tests/evm/src. This project also holds the contract behaviour specifications under testing/integration/tests/evm/features.

These contracts are used in testing/integration/tests/fevm_features, which is where the features files are parsed and executed. There is a common.rs here which can be used to capture shared idioms.

Example

❯ cargo test --release --test fevm_features
   Compiling fvm_integration_tests v0.1.1-alpha.1 (/home/aakoshh/projects/consensuslab/ref-fvm/testing/integration)
    Finished release [optimized] target(s) in 25.32s
     Running tests/fevm.rs (/home/aakoshh/projects/consensuslab/ref-fvm/target/release/deps/fevm-ebc615f4af0751af)
Feature: SimpleCoin
  Rule: Owner has initial balance
    Scenario: When we deploy the contract, the owner gets 10000 coins
     ✔  Given 1 random account
     ✔  When account 1 creates a SimpleCoin contract
     ✔  Then the balance of account 1 is 10000 coins
...
[Summary]
1 feature
4 rules
9 scenarios (9 passed)
40 steps (40 passed)

Tips:

  • To run a single scenario tag it like @wip and run it with cargo test --release --test fevm_features -- -t @wip.
  • Do not forget to add --release or loading a contract will take ~30 seconds!

Questions

  • [x] The tester allows creating two accounts with the same private key, but different actor IDs. Then, it mixes up their seqno. Can this happen in real life? We believe it's a bug in the tester.
  • [ ] When I create an actor, I save its ActorID in the CreateReturn and later use it to invoke the contract. However, when I make a recursive CALL, this is not the address I see in the sender, but probably the delegated address. It would be helpful to include more comments about which of the three addresses in CreateReturn should be used for what purpose.
  • [x] When I create a contract, I expect the balance of the account that does it to go down by the amount of gas used, yet it stays the same 10,000 the tester created the account with. The reason is probably because the gas_fee_cap is not set on the Message and the system doesn't check that it's at least as high as the base_fee. Is that a check only Lotus is expected to do? The gas_fee_cap is set to 0 on the Message by the tester, so there's no charge, only miner penalty.
  • [ ] Would it be okay to reference the EVM actor code in the tests, so we can use the data structures, such as EthAddress? The above PR duplicates them to avoid a circular dependency in src, but in the tests we are already referencing the actor bundle, and the actual types have extra methods we can use, for example to convert from Address to EthAddress.

TODO:

  • [x] Provide an example of parsing events. It looks like currently the fields are private, so there's no way to pass them to ethers. I made them public in this PR to get on with it.
  • [x] Test a contract that performs recursive calls with CALL, DELEGATECALL and REVERT
  • [ ] Test STATICCALL
  • [x] Test creating contracts within contracts
  • [x] Provide example of calling a contract with constructor arguments
  • [x] Test SELFDESTRUCT refund scenarios
  • [x] Test metamorphic contracts
  • [ ] Test SSTORE and SLOAD
  • [ ] Finish the first self destruct scenario Then the execution fails with message '???': is it really expected to fail? It doesn't.

Why Cucumber

It's easy to find articles on the internet that warn against using Cucumber for BDD, however, they are more often than not saying that in the context of testing UIs, where Selenium and RSpec can do a much better job without the overhead of the english syntax for scenarios. They warn that in rapidly evolving applications like a UI, Cucumber and Gherkin tests soon become unmaintainable.

In our case, however, we are dealing with smart contracts that once deployed, might never be change. For these, having a few scenarios describe their expected behaviour is not difficult to maintain.

Another motivation to reach for these was the personal circumstances when I was tasked with testing certain EVM opcodes with complex lifecycle scenarios: I didn't know (i) how to trigger these opcodes in Solidity and (ii) what their expected behaviour was. A situation like this is more what Cucumber was intended for: for a developer to sit down with an analyst to hash out scenarios and examples of what we are expecting from the system.

For simpler cases like returning the code size of a contract, this is definitely not the right tool.

Last, but not least, it has music going for it :cucumber:

aakoshh avatar Jan 18 '23 17:01 aakoshh

Codecov Report

Merging #1507 (bb9d8d8) into master (cd2c767) will increase coverage by 0.15%. The diff coverage is 0.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1507      +/-   ##
==========================================
+ Coverage   53.54%   53.69%   +0.15%     
==========================================
  Files         143      144       +1     
  Lines       13500    13464      -36     
==========================================
+ Hits         7229     7230       +1     
+ Misses       6271     6234      -37     
Impacted Files Coverage Δ
shared/src/event/mod.rs 0.00% <ø> (ø)
tools/fvm-bench/src/fevm.rs 0.00% <0.00%> (ø)
fvm/src/gas/price_list.rs 26.49% <0.00%> (-1.00%) :arrow_down:
fvm/src/kernel/default.rs 14.54% <0.00%> (-0.29%) :arrow_down:
fvm/src/machine/mod.rs 57.81% <0.00%> (ø)
fvm/src/machine/boxed.rs 0.00% <0.00%> (ø)
fvm/src/syscalls/event.rs 0.00% <0.00%> (ø)
fvm/src/executor/default.rs 5.45% <0.00%> (ø)
fvm/src/call_manager/default.rs 0.00% <0.00%> (ø)
fvm/src/blockstore/discard.rs 0.00% <0.00%> (ø)
... and 2 more

codecov-commenter avatar Jan 30 '23 10:01 codecov-commenter

@vyzo thanks for taking a look.

I also read that Cucumber DSL is hard to maintain in the context of a rapidly changing website and an RSpec style test with before and after supported in the code is much better. I don't disagree with that.

I this case, however, I think these smart contract basically never change. You have the EVM spec, you have a Solidity file that demonstrates some behaviour, and in real life once that's deployed, it is immutable, so similarly the surface area of the tests will not change.

What changes is the underlying mapping of how the FVM supports deploying, executing and querying the contracts, but that is hidden by the mostly reusable DSL implementations.

I think this is a good example of how a Business Analyst could engage with the specification of a smart contract without having to wade through code.

But no, I don't really expect that you would personally enjoy adding to this test

aakoshh avatar Mar 03 '23 13:03 aakoshh

Another comment on how hard something is to maintain: recently @raulk raised a point on the undocumented Gerbil scripts which were checked in along with the test files they generated, yet we added them to the repo after some comments were added. These tests are completely automated, so there's no risk of them rotting away, and I took care to leave comments and docs on every method I call.

You mention that tests have their own mental models and mini DSLs, and I agree, since they are individual Solidity contracts demonstrating some function or quirk of the EVM. However, I hope that this actually brings benefit to some developers, namely the ones who know Solidity, but not FEVM or the FVM. They can read the contract and the english description of what is expected to build up their mental model, and only then dig deeper into how things are achieved with FVM.

Honestly I spent half of my time doing just that, trying to figure out what combination of addresses and deserializers I have to use to achieve what the mental model of the contract wants me to demonstrate.

aakoshh avatar Mar 06 '23 09:03 aakoshh

This PR might be redundant/superseded by https://github.com/filecoin-project/fvm-workbench/issues/44 when we get to it.

rjan90 avatar Aug 20 '24 21:08 rjan90