ref-fvm icon indicating copy to clipboard operation
ref-fvm copied to clipboard

Continuity and regression testing tools

Open raulk opened this issue 2 years ago • 2 comments

Context

As we upgrade ref-fvm, the EVM runtime actor, and other programmability related components, it's paramount to not break existing smart contracts.

This is of utmost importance for mainnet; slightly less critical for Hyperspace given that the announced maturity of software releases made there beta. And such proclamation implies breakage every now and then.

However, even in such circumstances is vital to perform healthy change management with the community, by identifying publicly exactly what breaks, why, and even down to enumerating specific contracts sitting at specific addresses that are affected by the breakage.

We need to build new tools that will enable us to conduct this kind of continuity and regression analysis.

lotus-rebase

Lotus is the main client shipping with FEVM support. It is also where features are initially prototyped, and the client that currently powers Hyperspace.

My proposal is to introduce a new Lotus tool lotus-rebase (likely under lotus-shed) that does the following -- it can share a reasonable amount of code with tvx:

  1. Takes an epoch range and a target network version.
  2. For every tipset in the epoch range, creates a local overlay blockstore and runs all the migrations needed to upgrade the network version active at that height to the target network version.
  3. Replays all messages in that epoch against the upgraded state tree, reporting divergences in the receipt, actor nodes, and state CIDs of the recipient actor (could expand this to report divergences in every actor in the call stack).
  4. A common case of divergence will be gas.
    • If the new nv requires less gas for the same transaction, we expect the transaction to result in the same exit code (with high likelihood, unless the contract is playing funny gas tricks), and report the gas savings.
    • If the new nv requires more gas for the same transaction, the tool should offer an option to automatically ratchet the gas up (or perform a gas estimation for more accuracy).
      • When this mode is enabled, we layer one more overlay blockstore on top of the main overlay blockstore, so that we can safely discard the state changes from this transaction and repeat it without losing other progress.
  5. Good and usable reporting is the key to make this tool insightful -- it needs to be intuitive and needs to draw attention to the important changes. It will require thoughtfulness.

Potential CLI UX

# single epoch
lotus-rebase --epoch @1300 --upgrade-to 18 --adapt-gas=up
# range of epochs
lotus-rebase --epoch @1..@1400 --upgrade-to 18 --adapt-gas=up
# from genesis to HEAD
lotus-rebase --epoch @1..HEAD --upgrade-to 18 --adapt-gas=up

Notes

Warning: I'm probably missing some detail in specifications. I will take it on myself to prototype this and have it running on Hyperspace.

This tool is necessary to perform a safe and well-managed upgrade in Hyperspace to upcoming Carbonado.3, and to the final release after that.

raulk avatar Feb 02 '23 14:02 raulk

Note that there's some complexity trapped in here when dealing with heavier migrations, which are unfeasible to execute at every tipset on top of the entire state tree. Luckily, we already have tooling to identify actors involved in a transaction (inside tvx), so we could replay every transaction in a tipset we want to rebase, identify the participating actors, and migrate only those.

However, the migration code in Lotus would need to support selective migration, which we could enable through an optional interface that we implement only when this is needed:

  1. Some migrations are very lightweight and we can reasonably afford to run them over the entire state tree, with some level of local caching
  2. Other upgrades won't require this level of analysis

raulk avatar Feb 03 '23 16:02 raulk

@raulk anything to do here?

maciejwitowski avatar Mar 31 '23 15:03 maciejwitowski