substrate icon indicating copy to clipboard operation
substrate copied to clipboard

[FRAME] Simple multi block migrations

Open ggwpez opened this issue 1 year ago • 15 comments

Partial https://github.com/paritytech/polkadot-sdk/issues/198 (on_poll in https://github.com/paritytech/substrate/pull/14279).

pallet-migrations

This pallet takes a configuration of migrations Vec<Box<dyn SteppedMigration<…>>> and executes these in order on_initialize. The execution starts on each runtime upgrade. The unique identifiers of each executed migration are stored to prevent accidental re-execution of old ones.
While migrations are executed, only Mandatory extrinsics and inherents are allowed to execute. This is accomplished by implementing MultiStepMigrator for the pallet and querying it in frame-executive before each non-mandatory TX. Concretely this means that these transactions will panic and make a block un-importable. The BlockBuilder respects this limitation and retains all extrinsics in the pool while migrations are ongoing. Once the MBMs are done, these extrinsics will start to be included again. From a user's perspective this just results in an increased time until the TX is included.

The pallet supports an UpgradeStatusHandler that can be used to notify external logic of upgrade start/finish (for example to pause XCM dispatch).

Error recovery is very limited in the case that a migration errors or times out (exceeds its max_steps). Currently the runtime dev can decide in UpgradeStatusHandler::failed how to handle this. One follow-up would be to pair this with the SafeMode pallet and enact safe mode when an upgrade fails, to allow governance to rescue the chain. This is currently not possible, since governance is not Mandatory.

frame-executive + BlockBuilder

Executive now additionally takes an MultiStepMigrator trait that defaults to ().
IT IS PARAMOUNT TO SET THIS TO THE MIGRATIONS PALLET WHEN YOU DEPLOY IT.
The kitchensink runtime contains an example.

The block builder is aware of the new after_inherents function and uses it to figure out the MBM state.
As comparison the old and new logic (changes with +):

Current order:
	on_runtime_upgrade
	System::initialize
	on_initialize
	apply inherents
	apply extrinsics
	on_idle
	on_finalize

New order:
	on_runtime_upgrade
	System::initialize
	on_initialize
	apply inherents
+	after_inherents:
+		progress mbms
+		poll TBD
+	If not MBM:
+		apply extrinsics
+		on_idle
	on_finalize

Runtime API

  • BlockBuilder: Added after_inherents to progress ongoing MBMs and the poll hook TBD in https://github.com/paritytech/substrate/pull/14279.

frame-support

  • SteppedMigration: A multi-step migration.
  • ExtrinsicSuspenderQuery: Implemented by the migrations pallet and used by frame-executive to check TX suspension.
  • UpgradeStatusHandler: React to upgrade start, completion and failure.
  • FailedUpgradeHandling: Decide how to handle a failed upgrade. Either ForceUnstuck or KeepStuck.

TODO

  • [ ] Consume weight correctly
  • [ ] Cleanup

ggwpez avatar May 31 '23 12:05 ggwpez

@kianenigma am I right to assume we'll want to make it easy to test these with try-runtime-cli? Ideally should be as easy as on-runtime-upgrade, maybe make it auto-mine blocks until all multi-block migrations are complete & provide a hook that runs afterwards?

liamaharon avatar May 31 '23 19:05 liamaharon

Does this pauses user tx and incoming XCM etc? because otherwise it will be bad.

xlc avatar May 31 '23 21:05 xlc

Does this pauses user tx and incoming XCM etc? because otherwise it will be bad.

It pauses non-mandatory extrinsics in frame-executive by checking a flag. For XCM we probably need a suspension and resume hook additionally.

But now that Im thinking about this again; this would make it impossible to rescue if one of the migrations errors, since governance requires things like the referenda pallet.
So maybe in the case that something errors, it could be put into SafeMode to re-allow few extrinsics.

ggwpez avatar Jun 01 '23 12:06 ggwpez

@kianenigma am I right to assume we'll want to make it easy to test these with try-runtime-cli? Ideally should be as easy as on-runtime-upgrade, maybe make it auto-mine blocks until all multi-block migrations are complete & provide a hook that runs afterwards?

Good to think about it early, and yes I think it is possible with little effort. I believe follow-chain will already do everything needed to test an MBM?

kianenigma avatar Jun 01 '23 14:06 kianenigma

But now that Im thinking about this again; this would make it impossible to rescue if one of the migrations errors, since governance requires things like the referenda pallet. So maybe in the case that something errors, it could be put into SafeMode to re-allow few extrinsics.

I sometimes wish what if setCode was Mandatory? meaning it would always surpass weight limits and so on? For a relay chain it is fine, but it could allow a parachain to stall itself by doing a bad migration that is never included on the relay chain.

Somehow detecting faulty migrations, and going into a safe-mode is def. best. Let's review and merge those as well.

As of detection, I was thinking that each migration could have a MaxDuration: Option<BN>. If set, and not finished until reached, it will raise a flag. If not set, it means it can go on for ever.

kianenigma avatar Jun 01 '23 14:06 kianenigma

Yeah a timeout could be useful.

We should have a tool to estimate how many blocks are required to complete the migration and then we can double it and make it as the timeout and if the migration did not complete, the chain enter in safe mode only allowing governance related features

xlc avatar Jun 01 '23 21:06 xlc

@kianenigma am I right to assume we'll want to make it easy to test these with try-runtime-cli? Ideally should be as easy as on-runtime-upgrade, maybe make it auto-mine blocks until all multi-block migrations are complete & provide a hook that runs afterwards?

Good to think about it early, and yes I think it is possible with little effort. I believe follow-chain will already do everything needed to test an MBM?

@kianenigma moving this discussion to a new issue: https://github.com/paritytech/substrate/issues/14291

liamaharon avatar Jun 02 '23 11:06 liamaharon

We need to suspend everything that we can; that's the point of splitting out on_initialize() from poll().

We need to move pretty much all on_initialize() stuff into poll(), deprecate on_initialize() and on_finalize() and hard-deadline hooks into frame_system::Config allowing low-level pallets (basically just a few consensus pallets) to be registered for special start and/or end of block processing. Ideally we would have some way of ensuring that neither such a low-level pallet nor one which it might call into did not accidentally use this MBM or mandatory-extrinsic functionality. Because of this potential for contagion, we need to discourage HDH and ME functionality functionality as much as possible.

XCM queues would be processed in poll() so XCM would automatically be paused when MBM stopped calling poll(). The same is true for Scheduler.

gavofyork avatar Jun 07 '23 15:06 gavofyork

While migrations are executed, only Mandatory transactions are allowed to execute. This is accomplished by implementing ExtrinsicSuspenderQuery for the pallet and querying it in frame-support before each non-mandatory TX. Concretely this means that these transactions will all error and still burn the users funds. Not ideal, but a very simple solution.

Isn't it simpler to just make the migration use all the block weight until finished by returning the all weight used in on_initialize ? if a block is full only mandatory extrinsics can go in.

gui1117 avatar Jun 11 '23 07:06 gui1117

Isn't it simpler to just make the migration use all the block weight until finished by returning the all weight used in on_initialize ? if a block is full only mandatory extrinsics can go in.

Hm yea that would be pretty simple. Do you think this would be enough @gavofyork?
Other discussions yielded that we want a runtime API that polls the MBM state and only include non-mandatory transactions into a block when there is no MBM ongoing. But that is more complicated to implement obviously.
We can also do both... just as additional security measure, since the runtime would panic on importing a block with non-mandatory otherwise.

ggwpez avatar Jun 11 '23 10:06 ggwpez

my only concern was this:

Concretely this means that these transactions will all error and still burn the users funds.

but if frame-executive mark extrinsics as invalid and they are not included in the block then it is better than absuing the weight system. Also if we implement a post inherent like this https://github.com/paritytech/substrate/pull/10128 then we can exhaust the weight correctly after the inherents are executed.

gui1117 avatar Jun 11 '23 11:06 gui1117

The CI pipeline was cancelled due to failure one of the required jobs. Job name: cargo-check-benches Logs: https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/3092555

paritytech-cicd-pr avatar Jun 29 '23 15:06 paritytech-cicd-pr

The CI pipeline was cancelled due to failure one of the required jobs. Job name: cargo-check-benches Logs: https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/3092556

paritytech-cicd-pr avatar Jun 29 '23 15:06 paritytech-cicd-pr

Is this more or less ready for final review?

kianenigma avatar Jul 20 '23 14:07 kianenigma

Still first need to merge https://github.com/paritytech/substrate/pull/14414. I will ping you there for review when its ready.

ggwpez avatar Jul 20 '23 14:07 ggwpez