Lightweight state-machine testing
Has there been consideration for adding state-machine testing to (dynamically) test consensus logic (validators, epochs, state transitions, rewards/penalties, etc.)?
For example, a small YAML scenario might look like:
scenario:
initial_state:
validators:
- {id: 1, balance: 32, active: true}
- {id: 2, balance: 32, active: true}
actions:
- action: propose_block
params: { proposer: "random" }
- action: slash_validator
params: { validator: "random" }
invariants:
- no_negative_balance
- total_balance_properly_accounted
This could even use facilities from property-based testing tools (like Hypothesis) to:
- Help generate diverse sequences of state transitions
- Verify state invariants across all transitions
- Replay any found edge cases
The real value here is finding minimal sequences that trigger consensus failures between the spec and implementations.
When client behavior diverges from the specification, state-machine testing could even (try to) reduce complex failure cases to the smallest possible reproducing example.
I think, in any case, this approach can help generate test vectors that are compatible with existing client test runners, maintaining the current workflow while expanding test coverage.
Open to feedback, I might have I've missed context or existing efforts.
Hey @moodmosaic, sorry it's taken me so long to respond, it's been a busy couple of weeks. This is a cool idea, something like this could be quite useful! I'm unaware of any efforts exactly like this, but ethereum/consensus-compliance-tests came to mind; a suite of model based tests. See this discord thread about it: https://discord.com/channels/595666850260713488/1323326526070063204
@jtraglia, thanks for the links! I checked that repo and related materials: #3831, HackMD, ethereum/pm#1096, EthResearch.
That's an interesting piece of work around modeling and mutation-based testing! I didn't see foundational or academic papers cited, making it challenging (for me) to understand the theoretical background behind this. (Perhaps @ericsson49 can add those.)
I'm proposing testing at a different, complementary layer: state-level transitions (validator lifecycle, epochs, rewards, penalties, etc).
Inspired by (Hughes, "Testing the Hard Stuff"; Erlang QuickCheck docs), the goal is:
- dynamically generate randomized scenarios (e.g., propose → attest → slash)
- verify invariants (e.g., no negative balances, justified checkpoint ≤ finalized)
- (attempt to) shrink failing scenarios into minimal, reproducible test cases (long shot IME, but always nice to consider)
AFAICT, current fork-choice testing focuses specifically on block/attestation tree correctness. State-machine testing would broaden coverage to include complex validator-state interactions. (It should not replace existing fork-choice tests.)
I understand this isn't an easy task, but I felt it can still be valuable to share the idea.
Hey @moodmosaic,
Perhaps @ericsson49 can add those.
our fork choice test generation strategy is described in this post in more details. There is also our talk at DevCon. Regarding academic papers, it was mainly inspired by these papers:
Regarding state-level transitions, @mkalinin has been working on Validator lifecycle test generation, using MiniZinc/Python
AFAICT, current fork-choice testing focuses specifically on block/attestation tree correctness. State-machine testing would broaden coverage to include complex validator-state interactions. (It should not replace existing fork-choice tests.)
Our fork choice test generator follows the state machine testing paradigm. I.e. starting from an initial protocol state, generate diverse sequences of events, which are exactly block | attestation | attester_slashing | tick.
The focus has been indeed on the fork choice spec coverage. The coverage can be extended, but there is a beacon fuzz project, which focuses on differential fuzzing of the state_transition. So, it's perhaps not the top priority, while fork choice coverage had been lagging.
As I wrote earlier, @mkalinin has been working on testing validator lifecycle. There can be other aspects of consensus specs which deserve dedicated testing efforts.
there is a beacon fuzz project, which focuses on differential fuzzing of the state_transition
As we have discussed with @mkalinin today, it makes sense to test state_transition using the state-machine approach, since implementations use caching and can introduce bugs this way (e.g. querying the cache results in a stale value). It's not clear whether the beacon fuzz approach can catch such bugs in principle, but state machine approach should be able to do it.
Yes, and to run what I described with beacon_fuzz, you'd have to replace the invariants with state deltas. I'd need to think about what that might look like.
Thoughts on state machine:
Start with a probabilistic baseline (which you get out of the box with Hypothesis, plus all its other facilities, e.g. nice reporting, replaying, even a chance to shrink). — What invariants can I break by generating and executing diverse sequences of state transitions? This is the key question that must be answered.
(Only then move from the probabilistic approach to something more formal or guided.)
Expanding on the validator lifecycle testing, there is an effort that we have started to work a while ago but it is still WIP:
- There are validator states generated using model written in Minizinc as @ericsson49 wrote before
- There are tests that are initialized with a certain validator state and after applying an operation that updates the state it is checked that the resulting validator state is within the valid one
There is a short writeup proposing validator state testing. Also, recently with @ericsson49 we were discussing epoch processing testing leveraging a similar model based approach. It would allow us to catch bugs in clients related to caching validator set and missing updates to the cache.
What invariants can I break by generating and executing diverse sequences of state transitions? This is the key question that must be answered.
Writing explicit invariants for pyspecs would be definitely beneficial for testing (and for formal verification). There was an attempt to describe function preconditions in doc strings (can be extended to invariants too) #1945. I'm not sure though whether it's a good idea to keep invariants in the spec code, since it can clutter the specs.
(Only then move from the probabilistic approach to something more formal or guided.)
We are going to generate test vectors to test compliance of implementations wrt other Ethereum specs, like Execution and 3SF specs. It would be nice to use such test vectors to check consistency of the specs themselves, which will be more or less straightforward if invariants (pre-/post-conditions) are in an executable form.
From test generation perspective, we are considering writing models in Python (instead of writing them in MiniZinc), then we can use constraints solvers like MiniZinc/Gecode or SMT solvers to enumerate solutions and convert them to test vectors.
@ericsson49, @mkalinin, going through the materials you provided.
My (current) thinking is that this effort works well for single-step correctness and edge-state modeling, unsure if efficient enough to catch multi-step lifecycle bugs like activate → attest → exit → slash → withdraw where sequencing matters.
My (current) thinking is that this effort works well for single-step correctness and edge-state modeling, unsure if efficient enough to catch multi-step lifecycle bugs like activate → attest → exit → slash → withdraw where sequencing matters.
Yes. For instance, for epoch processing we would need a different model which would take sequences into account. If designed right, we believe such a model can be used to yield all possible combinations of scenarios when one thing affects the other.