substrate
substrate copied to clipboard
[FRAME] Simple multi block migrations
Partial https://github.com/paritytech/polkadot-sdk/issues/198 (on_poll
in https://github.com/paritytech/substrate/pull/14279).
pallet-migrations
This pallet takes a configuration of migrations Vec<Box<dyn SteppedMigration<…>>>
and executes these in order on_initialize
. The execution starts on each runtime upgrade. The unique identifiers of each executed migration are stored to prevent accidental re-execution of old ones.
While migrations are executed, only Mandatory
extrinsics and inherents are allowed to execute. This is accomplished by implementing MultiStepMigrator
for the pallet and querying it in frame-executive
before each non-mandatory TX. Concretely this means that these transactions will panic and make a block un-importable. The BlockBuilder respects this limitation and retains all extrinsics in the pool while migrations are ongoing. Once the MBMs are done, these extrinsics will start to be included again. From a user's perspective this just results in an increased time until the TX is included.
The pallet supports an UpgradeStatusHandler
that can be used to notify external logic of upgrade start/finish (for example to pause XCM dispatch).
Error recovery is very limited in the case that a migration errors or times out (exceeds its max_steps
). Currently the runtime dev can decide in UpgradeStatusHandler::failed
how to handle this. One follow-up would be to pair this with the SafeMode
pallet and enact safe mode when an upgrade fails, to allow governance to rescue the chain. This is currently not possible, since governance is not Mandatory
.
frame-executive
+ BlockBuilder
Executive
now additionally takes an MultiStepMigrator
trait that defaults to ()
.
IT IS PARAMOUNT TO SET THIS TO THE MIGRATIONS PALLET WHEN YOU DEPLOY IT.
The kitchensink runtime contains an example.
The block builder is aware of the new after_inherents
function and uses it to figure out the MBM state.
As comparison the old and new logic (changes with +
):
Current order:
on_runtime_upgrade
System::initialize
on_initialize
apply inherents
apply extrinsics
on_idle
on_finalize
New order:
on_runtime_upgrade
System::initialize
on_initialize
apply inherents
+ after_inherents:
+ progress mbms
+ poll TBD
+ If not MBM:
+ apply extrinsics
+ on_idle
on_finalize
Runtime API
-
BlockBuilder
: Addedafter_inherents
to progress ongoing MBMs and thepoll
hook TBD in https://github.com/paritytech/substrate/pull/14279.
frame-support
-
SteppedMigration
: A multi-step migration. -
ExtrinsicSuspenderQuery
: Implemented by themigrations
pallet and used byframe-executive
to check TX suspension. -
UpgradeStatusHandler
: React to upgrade start, completion and failure. -
FailedUpgradeHandling
: Decide how to handle a failed upgrade. EitherForceUnstuck
orKeepStuck
.
TODO
- [ ] Consume weight correctly
- [ ] Cleanup
@kianenigma am I right to assume we'll want to make it easy to test these with try-runtime-cli? Ideally should be as easy as on-runtime-upgrade
, maybe make it auto-mine blocks until all multi-block migrations are complete & provide a hook that runs afterwards?
Does this pauses user tx and incoming XCM etc? because otherwise it will be bad.
Does this pauses user tx and incoming XCM etc? because otherwise it will be bad.
It pauses non-mandatory extrinsics in frame-executive by checking a flag. For XCM we probably need a suspension and resume hook additionally.
But now that Im thinking about this again; this would make it impossible to rescue if one of the migrations errors, since governance requires things like the referenda pallet.
So maybe in the case that something errors, it could be put into SafeMode
to re-allow few extrinsics.
@kianenigma am I right to assume we'll want to make it easy to test these with try-runtime-cli? Ideally should be as easy as
on-runtime-upgrade
, maybe make it auto-mine blocks until all multi-block migrations are complete & provide a hook that runs afterwards?
Good to think about it early, and yes I think it is possible with little effort. I believe follow-chain
will already do everything needed to test an MBM?
But now that Im thinking about this again; this would make it impossible to rescue if one of the migrations errors, since governance requires things like the referenda pallet. So maybe in the case that something errors, it could be put into SafeMode to re-allow few extrinsics.
I sometimes wish what if setCode
was Mandatory
? meaning it would always surpass weight limits and so on? For a relay chain it is fine, but it could allow a parachain to stall itself by doing a bad migration that is never included on the relay chain.
Somehow detecting faulty migrations, and going into a safe-mode is def. best. Let's review and merge those as well.
As of detection, I was thinking that each migration could have a MaxDuration: Option<BN>
. If set, and not finished until reached, it will raise a flag. If not set, it means it can go on for ever.
Yeah a timeout could be useful.
We should have a tool to estimate how many blocks are required to complete the migration and then we can double it and make it as the timeout and if the migration did not complete, the chain enter in safe mode only allowing governance related features
@kianenigma am I right to assume we'll want to make it easy to test these with try-runtime-cli? Ideally should be as easy as
on-runtime-upgrade
, maybe make it auto-mine blocks until all multi-block migrations are complete & provide a hook that runs afterwards?Good to think about it early, and yes I think it is possible with little effort. I believe
follow-chain
will already do everything needed to test an MBM?
@kianenigma moving this discussion to a new issue: https://github.com/paritytech/substrate/issues/14291
We need to suspend everything that we can; that's the point of splitting out on_initialize()
from poll()
.
We need to move pretty much all on_initialize()
stuff into poll()
, deprecate on_initialize()
and on_finalize()
and hard-deadline hooks into frame_system::Config
allowing low-level pallets (basically just a few consensus pallets) to be registered for special start and/or end of block processing. Ideally we would have some way of ensuring that neither such a low-level pallet nor one which it might call into did not accidentally use this MBM or mandatory-extrinsic functionality. Because of this potential for contagion, we need to discourage HDH and ME functionality functionality as much as possible.
XCM queues would be processed in poll()
so XCM would automatically be paused when MBM stopped calling poll()
. The same is true for Scheduler.
While migrations are executed, only
Mandatory
transactions are allowed to execute. This is accomplished by implementingExtrinsicSuspenderQuery
for the pallet and querying it inframe-support
before each non-mandatory TX. Concretely this means that these transactions will all error and still burn the users funds. Not ideal, but a very simple solution.
Isn't it simpler to just make the migration use all the block weight until finished by returning the all weight used in on_initialize
? if a block is full only mandatory extrinsics can go in.
Isn't it simpler to just make the migration use all the block weight until finished by returning the all weight used in
on_initialize
? if a block is full only mandatory extrinsics can go in.
Hm yea that would be pretty simple. Do you think this would be enough @gavofyork?
Other discussions yielded that we want a runtime API that polls the MBM state and only include non-mandatory transactions into a block when there is no MBM ongoing. But that is more complicated to implement obviously.
We can also do both... just as additional security measure, since the runtime would panic on importing a block with non-mandatory otherwise.
my only concern was this:
Concretely this means that these transactions will all error and still burn the users funds.
but if frame-executive mark extrinsics as invalid and they are not included in the block then it is better than absuing the weight system. Also if we implement a post inherent like this https://github.com/paritytech/substrate/pull/10128 then we can exhaust the weight correctly after the inherents are executed.
The CI pipeline was cancelled due to failure one of the required jobs. Job name: cargo-check-benches Logs: https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/3092555
The CI pipeline was cancelled due to failure one of the required jobs. Job name: cargo-check-benches Logs: https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/3092556
Is this more or less ready for final review?
Still first need to merge https://github.com/paritytech/substrate/pull/14414. I will ping you there for review when its ready.