substrate icon indicating copy to clipboard operation
substrate copied to clipboard

New pallets: `safe-mode` and `transaction-pause`

Open ggwpez opened this issue 3 years ago • 20 comments

Work on the SafeMode and TxPause pallets. Closes https://github.com/paritytech/substrate/issues/11626, related https://github.com/paritytech/substrate/issues/10033. Depends on https://github.com/paritytech/substrate/issues/13511

PS: Please rather review the concept instead of the code, since it is still subject to change.

SafeMode pallet

The safe-mode pallet provides a big STOP button to to put the chain in safe-mode and thereby only permitting a certain subset of operations. The pallet provides a SafeModeFilter which contains all calls that can be executed in safe-mode.
It can be permissionessly enabled by anyone by reserving a large stake.
The safe-mode pallet is then used by the runtime as call filter like such:

impl frame_system::Config for Runtime {
	…
	type BaseCallFilter = InsideBoth<DefaultFilter, SafeMode>;
	…
}

Extrinsics

enable: Enable the safe-mode permissionlessly for EnableDuration blocks.
Reserves an EnableStakeAmount amount of balance from the caller.
This extrinsic can be disabled by configuring EnableStakeAmount to None.

force_enable: Permissioned call that enables the safe-mode for a duration of blocks that can be configured per origin.

extend: Extend the safe-mode permissionlessly for ExtendDuration more blocks.
Reserves ExtendStakeAmount from the caller's account.
This extrinsic can be disabled by configuring ExtendStakeAmount to None.

force_extend:
Permissioned call that extends the safe-mode for a duration of blocks that can be configured per origin.

force_disable: Permissioned call to instantly disable the safe-mode.

on_initialize: Disables the safe-mode if its duration ran out in this block.

repay_stake(account, block_number): Permissioned call to repay the stake to the account that enabled the safe-mode in block block_number.

slash_stake(account, block_number): Permissioned call to slash the stake of the account that enabled the safe-mode in block block_number.

TxPause pallet

The TxPause pallet can be used to pause specific calls. Think of it as a dynamic call filter that can be controlled with extrinsics.
It currently features per-extrinsic pausing, but per-pallet pausing would also be possible.
This is what many para-chains currently have deployed on their chains.

Can also be used as call-filter by the runtime together with the SafeMode:

impl frame_system::Config for Runtime {
	…
	type BaseCallFilter = InsideBoth<DefaultFilter, InsideBoth<TxPause, SafeMode>>;
	…
}

Extrinsics

pause_call(pallet, function):
Permissioned call to pause a specific extrinsic.

unpause_call(pallet, function):
Permissioned call to unpause a specific extrinsic.

TODOS:

  • [ ] Design finished
  • [x] Q: How to make them work together?
  • [x] Intern feedback
  • [x] Parachain feedback
  • [x] Tests + Benchmarks https://github.com/paritytech/substrate/pull/12148 and https://github.com/paritytech/substrate/pull/12259
  • [ ] Maybe Traits + Impls

ggwpez avatar Aug 23 '22 11:08 ggwpez

We are missing the origin where some user can "transfer a huge amount of dot" to trigger a tx pause or safe mode for a temporary period of time.

shawntabrizi avatar Aug 25 '22 12:08 shawntabrizi

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/parachain-technical-summit-next-steps/51/8

Polkadot-Forum avatar Aug 27 '22 15:08 Polkadot-Forum

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/parachain-technical-summit-next-steps/51/1

Polkadot-Forum avatar Aug 27 '22 15:08 Polkadot-Forum

I think they're fine as two pallets which can be used to work together. There should be a means of specifying an EnsureOrigin impl which can enter safe mode for a period of time (the EnsureOrigin should have a Success value which is the number of blocks that the origin may induce safe mode).

Any deposit made for inducing safe mode should not be returned by default. A simple solution would be for it to go into the treasury and be paid back only by an explicit treasury spend. A more sophisticated solution would have it be reserved and refunded by a particular Refund EnsureOrigin impl (which would likely be configured as a Gov2 origin and probably have an associated referendum curve/collective behind it).

Beyond that, I think design-wise it's good.

gavofyork avatar Aug 30 '22 18:08 gavofyork

I think they're fine as two pallets which can be used to work together.

Then we should have some sane-defaults that the pallets dont ban each other.
In theory they can work together very well; 1) enable safe-mode, 2) pause offending transactions, 3) disable safe-mode.

There should be a means of specifying an EnsureOrigin impl which can enter safe mode for a period of time (the EnsureOrigin should have a Success value which is the number of blocks that the origin may induce safe mode).

Okay. I assume the required stake is either also returned by that or scales with a configured formula.

Any deposit made for inducing safe mode should not be returned by default.

Currently only a RepayOrigin can either repay or slash the reserved stake. But keeping it even more simple sounds tempting.

PS: I will probably rename enable/disable to enter/leave as that sounds nicer.

ggwpez avatar Aug 30 '22 18:08 ggwpez

Doing a read here, I have some thoughts I hope it's not too much to discuss here :pray:

Doubts & Questions:

  • If done on relay chain, all paras and XCMP still operate? for example, XCMP or reserve backed transfers might be blocked. What other considerations needed?
  • In XCM, would this block all or some specific calls? (tests needed) consider fallout of no callback to known this is intentionally blocked
  • Could/should external parachain (sovereign accounts) be given special ability to execute safe-mode (without stake for example, or only to bar XCM on one chain to them. )
  • Speaking for any substrate chain: is the governance origin is always root? If not, root should be "whitelisted" such that it cannot be blocked by filter even if not explicitly configured?
  • Is there ever an internal call that could be blocked by this, like an unsigned required action... or more likely a tightly coupled pallet where this might block logic progression?
    • Offchain workers that are required in some way failing? Like oracles, etc.
    • Offchain workers triggered on hooks that cannot dispatch could lead to it being triggered every block and depending on if it expected to succeed eventually and only happen infrequently could cause problems, IE very very expensive computation and/or paid API services.
    • Other systems/bridges that assume liveness and non-failure... blocks are still produced, but in effect the network is halted (at least it errors out)
  • ~Once in place, should this be in the solo and parachain templates? It's a very opinionated & powerful tool to add to a chain, and could be abused by users: for example testnets could be blocked on a whim to cause chaos for "fun" if the stake isn't too high for faucet abusers to get enough.~ I would think not appropriate.
  • Possible exploits in on and cross chain DeFi if gains by blocking tx are larger than stake: you lock the network and exploit data to front-run.
    • Timeout functions related to block height could be exploited. Like payment channels in lightning on bitcoin: lock the network long enough (but blocks are still ticking up) to stop a justice tx being submitted.
    • related to assumptions of liveness / (eventual) completion of bridge actions.

Design changes?

  • ~Should there be different safe-modes possible?~ no, tx-pause can use pre-configured settings.
  • Should the justification for safe mode be explicitly required (and placed in storage w/ impl to query from user space)?
    • Something more like a governance proposal storage item in this pallet & flow (like a proposal on Polkassembly being fired off when this is triggered) for enacting this extremely disruptive action.
      • Could be as simple, bounded, string that is a statement of "why this safe-mode was triggered" and/or enforce a CID to a statement would be available.
  • Pause pallet has bounded vecs that are not storage items... yet?
    • The pause pallet storage of names of calls could be misconfigured as there is no such limit on naming AFAIK natively. so maybe you could make a too long name and miss this in tests that check for explicitly & exhaustively checking the filters work.
    • Adding this pallet would require you truncate names of calls, or set large enough not to fail... nothing more automatic possible?
      • Would the FRAME generated storage items be a blocker here, if you didn't want to do a migration? Rename of these -> migration in most cases.
  • Can we throw a pallet specific error instead of CallFiltered? so unambiguous to everyone (end users, future XCM callbacks, APIs) that this safe mode is the reason things are halted.

nuke-web3 avatar Sep 02 '22 18:09 nuke-web3

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/parachain-technical-summit-next-steps/51/10

Polkadot-Forum avatar Sep 07 '22 09:09 Polkadot-Forum

One concern I have about how safe-mode would play out in practice relates to centralization.

If I want to engage safe-mode (regardless of my reasoning for doing so), I will have to first consider what will happen to my deposit. Basically, this means I will need to speculate on what would happen when I go through governance (etc) to request that those funds be returned to me.

What exactly this means may vary, but I suspect it will generally mean that if I am known/respected by those who will vote, I have a stronger expectation that I will have my funds returned and vise-versa. This basically introduces a weak centralization effect.

Some blatant exaggerations of this:

  • If I am well-respected, I may get away with a poor reason for engaging safe-mode
  • If I am unknown/not well-respected, I may be punished harshly for a sound reason for engaging safe-mode
  • If I have a very good reason to engage safe-mode I will hesitate for fear over losing my funds
  • If I have an extremely high confidence in getting my funds back, I may be very loose with engaging safe-mode

To some degree, this argument is really just saying that the centralization (or lack thereof) of governance would be inherited in weak ways by this pallet.

notlesh avatar Sep 07 '22 17:09 notlesh

To some degree, this argument is really just saying that the centralization (or lack thereof) of governance would be inherited in weak ways by this pallet.

I would argue that it is in the interest of the chain to treat users who reported malfunctions via a safe-mode properly, since it actually adds value to the chain to have such users monitor the chain.
But yea, in general I also think this would only amplify any underlying governance problems, not produce them.

ggwpez avatar Sep 20 '22 20:09 ggwpez

It will be great if it is possible to set a preconfigured group of calls and able to pause/unpause all of them at once. Basically similar to ProxyType to define some category and use it for filtering

Okay. So we could add a config option to the TxPause pallet to map u32 -> BoundedVec<Call> which can then be used to quickly address a filter by an index. Or even Vec<u8> as index type for easier UI.
@NukeManDan what do you think?

ggwpez avatar Sep 22 '22 12:09 ggwpez

It doesn't need to be in this PR. No need to implement everything in a single pass.

Also I am thinking something like this

https://github.com/paritytech/substrate/blob/cc4d5cc8654d280f03a13421669ba03632e14aa7/bin/node/runtime/src/lib.rs#L279-L313

So we can define

enum CallGroup {
  Transfer,
  Staking,
  Xcm,
  All
}

and can disable/enable each call group

xlc avatar Sep 22 '22 12:09 xlc

Yes, I think adding a something ProxyType would make a lot of sense for a quick set of preestablished options to use, especially considering trying to accommodate swift action to enable the transaction pause filters is a great idea.

Let me take a stab at implementing this, in this PR, as the logic shouldn't be hard to get using proxy as reference.

Do we want exclusively these enum variants to be the options available? I think we likely want both: to be able to select from that enum or set a singular call to filter. So additive/subtractive calls to this pallet with either of the options, and update a storage vector of actively paused calls.

nuke-web3 avatar Sep 22 '22 13:09 nuke-web3

Giving my 2 cents based on our current "Maintenance pallet". Would it be possible to separate the pallet in 2:

  1. The origins used to interact with the pallet (enable/disable),
  2. The logic of the pallet (handling the filters...)

This would allow us to use the logic of this pallet while keeping our origin as we do in our Maintenance pallet (which doesn't require deposit and is controlled by a collective)

crystalin avatar Sep 22 '22 19:09 crystalin

@crystalin maybe I am missing the request here, what of the two pallets do you want split here? And IIUC the many configurable origins here allow for you to use your collective origin(s) as you see fit for specific calls. Most recently logic was added to safe mode to allow for differing behaviors based on the origin

See https://github.com/paritytech/substrate/commit/379b82f0f4397fee148d9c924eeadccdda21bb84#diff-c8a277fd45ad463eff816d0cd84b815aab8ddca058f250342624ced397d4c46dR93

nuke-web3 avatar Sep 22 '22 21:09 nuke-web3

This would allow us to use the logic of this pallet while keeping our origin as we do in our Maintenance pallet (which doesn't require deposit and is controlled by a collective)

Yes this is still possible for the safe-mode pallet. You could create multiple origin and assign a "strength" to each, which then translates to how long each origin may activate the safe-mode.
This can be used as: let duration = T::ForceActivateOrigin::ensure_origin(sender)?; through the magic of EnsureOrigin<Success=T::BlockNumber>.
The staked deposit is only for permissionless activation (which can be turned off in config).

ggwpez avatar Sep 22 '22 23:09 ggwpez

~Considering changing the inputs on some incoming pallets but wanted to gauge if you all think it's an improvement or not in paritytech/substrate#12371 and would love some input there on the change to the calls~ EDIT: reviewed poorly, and not included.

		pub fn unpause_call(
-			pallet: PalletNameOf<T>,
-			call: CallNameOf<T>,
+			call: Box<<T as Config>::RuntimeCall>,
                )
  • Here for why this input is hard to use as an runtime engineer
  • The OP on the PR for the reasoning.

Open for ideas and feedback 🙏 ❤️

nuke-web3 avatar Sep 29 '22 23:09 nuke-web3

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/safe-mode-and-transaction-pause-pallets/636/1

Polkadot-Forum avatar Oct 06 '22 01:10 Polkadot-Forum

Hi! Any updates? I'm looking forward to using this.

aurexav avatar Feb 03 '23 02:02 aurexav

Hi! Any updates? I'm looking forward to using this.

Not really. I will try to get a minimal version in.

ggwpez avatar Feb 06 '23 18:02 ggwpez

A few minor comments and still need to add top level doc comments for the pallets, otherwise LGTM!

Yes I will do this in parallel to the audit. Just want to get that going soon.

ggwpez avatar Jun 05 '23 16:06 ggwpez

Would it be beneficial to automatically trigger safe_mode in some scenario / based on some condition, say, to avoid a catastrophic but detectable situation that could be effectively mitigated by running safe mode until some other condition is met (or until safe mode is manually exited)? If so, what would that look like? I'm imagining conditions you can set where safe mode will automatically trigger, and for each of those, an additional condition where it will exit, and the conditions themselves could be managed via on-chain governance.

Yea I think several para-chains have these kind of automatic switches in place. If some huge token transfer is requested, they may first enqueue it and then possibly lock down the whole pallet.
This safe-mode pallet here implements the SafeMode trait, so it could be used for that. But it still needs some logic input / monitoring pallet that triggers it on certain conditions.
I think from our perspective it is good to think about this. As long as we keep building the tools; the parachains can tell us how they use it and what they need.
If this is viable on a relay, i dont know. Probably not, since eventually we want all the logic to reside in parachains anyway.

ggwpez avatar Jun 06 '23 09:06 ggwpez

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 09 '23 06:07 stale[bot]

Waiting for audit...

ggwpez avatar Jul 09 '23 10:07 ggwpez

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 08 '23 17:08 stale[bot]

Auditerino it is

ggwpez avatar Aug 08 '23 18:08 ggwpez

Happy birthday :partying_face:

Going to merge as notlive since the audit will not start within another two months or so.

ggwpez avatar Aug 24 '23 20:08 ggwpez

bot bench substrate-pallet --pallet=pallet_safe_mode

ggwpez avatar Aug 24 '23 21:08 ggwpez

@ggwpez https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/3437154 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_safe_mode. Check out https://gitlab.parity.io/parity/mirrors/substrate/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 6-4d486b03-ff42-4f89-a7df-2a03667e67b0 to cancel this command or bot cancel to cancel all commands in this pull request.

command-bot[bot] avatar Aug 24 '23 21:08 command-bot[bot]

bot bench substrate-pallet --pallet=pallet_tx_pause

ggwpez avatar Aug 24 '23 21:08 ggwpez

@ggwpez https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/3437155 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_tx_pause. Check out https://gitlab.parity.io/parity/mirrors/substrate/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 7-7fb643c2-f827-4323-a94c-78d6759dd974 to cancel this command or bot cancel to cancel all commands in this pull request.

command-bot[bot] avatar Aug 24 '23 21:08 command-bot[bot]