polkadot icon indicating copy to clipboard operation
polkadot copied to clipboard

runtime/disputes: slashing

Open ordian opened this issue 3 years ago • 6 comments

This PR implements runtime logic for dispute slashing.

  • Validator who voted that a valid (decided by a supermajority) candidate (parachain block) is invalid will be slashed 1% of their stake (including nominators). This can happen to very slow validators that timeout during a PVF execution.
  • Validator who voted that a invalid (decided by a supermajority) candidate (parachain block) is valid will be slashed 100% of their stake (including nominators) and kicked out of the validator set. This can happen to malicious validators or validators that are >3x faster than the supermajority (we have 2s timeout for backing and 6s for approval-voting).

Implementation

The implementation uses the offences pallet and looks like a hybrid of im-online and grandpa slashing impls. Meaning, we submit offences for the concluded disputes about the current session candidate directly from the runtime. If, however, the dispute is about past session, we record pending slashes on chain, without FullIdentification of the offenders. Later on, a block producer can submit an unsigned transaction with KeyOwnershipProof of an offender and submit it to the runtime to produce an offence.

The reason for this separation is that even though it's currently technically possible to get FullIdentification of the past session validators, we don't want to rely on this information to be available on chain for the past sessions because it's heavy.

Open questions:

  • [x] Initial numbers for slashing. Seems like 100% and 1% is OK.
  • [x] Do we need exponential slashing? Punt on for now. Might not be needed in the future.
  • [x] Slashing for inconclusive disputes. Punt on for now.
  • [x] How do we want disabling to work exactly. Using type Disabled from session is probably fine.

TODOs:

  • [x] resolve open questions
  • [x] fix remaining TODOs in code
  • [x] simplify the traits/generics
  • [x] zombienet test (only a check for offences)
  • [x] benchmarks/weights

Follow-up work:

  • staging APIs for disabled validators and pending slashes
  • client changes to submit unsigned transactions

Closes #3161.

ordian avatar May 16 '22 12:05 ordian

Do we need exponential slashing?

From an implementation complexity side, it would probably be easier without. 1% is already a pretty high deterrent and client bugs are not unlikely at this stage. From a theoretical perspective we'd probably want exponential slashing kicking in after a certain threshold % of validators and maxing out at 100% at around 1/3-1 of validators, because that can actually pose a risk to finality. In the near term I don't think it matters much.

How do we want disabling to work exactly?

Disabling should apply to validators who vote for invalid blocks. When a validator advocates an invalid block within an era, other validators should ignore all backing messages originating by them for the rest of the era. We should never disable more than f validators within an era and should use a ring-buffer to accomplish that. This means that validators:

  1. Get to try to attack the network only a small number of times per day (as many backed candidates as they can get in before getting slashed)
  2. Succeed only with 1/N probability, where N is a large number like 10 milliion or 1 billion.
  3. Get slashed 100% every time they fail.

rphmeier avatar May 18 '22 20:05 rphmeier

Thank you for the feedback.

From an implementation complexity side, it would probably be easier without.

It doesn't have to be exponential, we can do simply multiplier * base_slash (1%), where multiplier is capped at e.g. 5, at which point we also disable a validator. The main concern here is that we seem to slash a max and not a sum for a period of time if I'm not mistaken, so additional misbehaviors would "free" if we don't disable or escalate.

When a validator advocates an invalid block within an era, other validators should ignore all backing messages originating by them for the rest of the era.

That would require keeping track of session key rotations. For starters, I'd implement ignoring for the rest of the session. They will be kicked out of the validator set in the next session anyway.

Is it okay to use type DisableValidators from Session that would include other slashes (GRANDPA and BABE), or do we want exclusively disputes here? https://github.com/paritytech/polkadot/blob/c81fb04560c843fdc58892f663a111fcdd314b97/runtime/kusama/src/lib.rs#L270

We should never disable more than f validators within an era and should use a ring-buffer to accomplish that.

If I'm not mistaken staking/offences forces new era if enough validators are disabled: https://github.com/paritytech/substrate/pull/9448.

ordian avatar May 18 '22 20:05 ordian

For starters, I'd implement ignoring for the rest of the session. They will be kicked out of the validator set in the next session anyway.

Yes, this seems fine. It only makes the attack ~4x less expensive which still leads to gambler's ruin quite quickly.

Is it okay to use type DisableValidators from Session that would include other slashes (GRANDPA and BABE), or do we want exclusively disputes here?

Including other slashes is fine as long as the same principle applies: no more than 1/3 of validators may be disabled at any time.

rphmeier avatar May 18 '22 20:05 rphmeier

Note that in the current form it doesn't enable slashing on Kusama, only on Westend. Still waiting for reviews.

ordian avatar Jul 28 '22 16:07 ordian

I believe all feedback apart from HostConfiguration configuration and bounded cleanup is addressed. Both of these can be addressed as a follow-up. Please take another look.

ordian avatar Aug 05 '22 10:08 ordian

The rewards are implemented in #5862

ordian avatar Aug 05 '22 12:08 ordian

/cmd queue -c bench-bot $ pallet westend-dev runtime_parachains::disputes::slashing

ordian avatar Aug 31 '22 12:08 ordian

@ordian https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1793574 was started for your command "$PIPELINE_SCRIPTS_DIR/bench-bot.sh" pallet westend-dev runtime_parachains::disputes::slashing. Check out https://gitlab.parity.io/parity/mirrors/polkadot/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment /cmd cancel 37-fdaf9945-ccd7-48e0-81b2-ba949f0ac651 to cancel this command or /cmd cancel to cancel all commands in this pull request.

command-bot[bot] avatar Aug 31 '22 12:08 command-bot[bot]

@ordian Command "$PIPELINE_SCRIPTS_DIR/bench-bot.sh" pallet westend-dev runtime_parachains::disputes::slashing has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1793574 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1793574/artifacts/download.

command-bot[bot] avatar Aug 31 '22 12:08 command-bot[bot]

/cmd queue -c bench-bot $ runtime westend-dev runtime_parachains::disputes::slashing

ordian avatar Aug 31 '22 12:08 ordian

@ordian https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1793608 was started for your command "$PIPELINE_SCRIPTS_DIR/bench-bot.sh" runtime westend-dev runtime_parachains::disputes::slashing. Check out https://gitlab.parity.io/parity/mirrors/polkadot/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment /cmd cancel 38-be92d727-9197-4be7-8ab6-c109f696d545 to cancel this command or /cmd cancel to cancel all commands in this pull request.

command-bot[bot] avatar Aug 31 '22 12:08 command-bot[bot]

@ordian Command "$PIPELINE_SCRIPTS_DIR/bench-bot.sh" runtime westend-dev runtime_parachains::disputes::slashing has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1793608 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1793608/artifacts/download.

command-bot[bot] avatar Aug 31 '22 12:08 command-bot[bot]

/cmd queue -c bench-bot $ runtime westend-dev runtime_parachains::disputes::slashing

ordian avatar Aug 31 '22 12:08 ordian

@ordian https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1793759 was started for your command "$PIPELINE_SCRIPTS_DIR/bench-bot.sh" runtime westend-dev runtime_parachains::disputes::slashing. Check out https://gitlab.parity.io/parity/mirrors/polkadot/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment /cmd cancel 39-5a721570-7a10-410e-a243-2379d20c1196 to cancel this command or /cmd cancel to cancel all commands in this pull request.

command-bot[bot] avatar Aug 31 '22 12:08 command-bot[bot]

@ordian Command "$PIPELINE_SCRIPTS_DIR/bench-bot.sh" runtime westend-dev runtime_parachains::disputes::slashing has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1793759 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1793759/artifacts/download.

command-bot[bot] avatar Aug 31 '22 13:08 command-bot[bot]

With regards to slash size: I am thinking more and more that a more soft start might make sense: For wasting resources, the most important thing is to disable validators so this cannot result in a DoS.

What about reducing the against valid slash to just one permille for now? For honest validators that slash would be high enough (I think any slash actually would accomplish that) to act and fix any issues on your end, but low enough to not get too mad. Given that you also lose nominators, it would already be pretty bad. So my plan would be:

  • longer timeouts
  • figuring out better what is actually happening on those disputes
  • have no disputes on Kusama for a while (at least not from validators that can be reached)
  • lower against valid slashes
  • launch (soft) -> get any remaining validators to act as they are losing money

Resoning: Rewards are around 15% per year, hitting a timeout a single time over the course of an entire year would reduce that by 1% - actually more if you lose nominators (is that really the case, I just read that on some channel) and get removed from the active set.

I think at this still early stage of Kusama and Polkadot those lowered slashes would make a lot of sense.

eskimor avatar Sep 01 '22 14:09 eskimor

Yes, we can run a while with just disabling, and then bring in the slashing very slowly. We can also just plan to refund the first say five slashes after the validator reports their hardware configuration, assuming it's a better hardware config than the previous guy who got slashed.

burdges avatar Sep 01 '22 14:09 burdges

Also, we researchers suggested significantly more than just this doubling of the delay in our analysis in https://github.com/paritytech/polkadot/pull/1656 like 6 x I think.

We should avoid the need for that once the deterministic metering works, but just fyi..

burdges avatar Sep 01 '22 14:09 burdges

What about reducing the against valid slash to just one permille for now?

Assuming you mean 0.1%, I've changed that. Any other changes you'd like to see in this PR? The plan makes sense to me :+1:

ordian avatar Sep 01 '22 14:09 ordian

With https://github.com/paritytech/polkadot/pull/5951 we will have that 6x. 2s on backing 12s in approvals.

eskimor avatar Sep 01 '22 14:09 eskimor

Another change: 7bf295a cc @eskimor.

ordian avatar Sep 05 '22 10:09 ordian

due to lacking locks-review this will slip to the next release

coderobe avatar Sep 07 '22 13:09 coderobe

We've this intermediate "valid but too slow" mode discussed in https://github.com/paritytech/polkadot/pull/1656

I presume the 6x in https://github.com/paritytech/polkadot/pull/5951 does not bring this distinction in?

@AlistairStewart and I were recently leaning towards still accepting "valid but too slow" blocks and not forking the relay chain to kill them. Imho, similar reasoning suggests any "valid but too slow" logic could be omitted for a while, in hopes Sergey's deterministic metering pans out, but if our current problems are bad enough then maybe we need something of this "valid but too slow" logic.

I just wanted to bring up this subtlety..

burdges avatar Sep 08 '22 13:09 burdges

We've this intermediate "valid but too slow" mode discussed in #1656

I presume the 6x in #5951 does not bring this distinction in?

@AlistairStewart and I were recently leaning towards still accepting "valid but too slow" blocks and not forking the relay chain to kill them. Imho, similar reasoning suggests any "valid but too slow" logic could be omitted for a while, in hopes Sergey's deterministic metering pans out, but if our current problems are bad enough then maybe we need something of this "valid but too slow" logic.

I just wanted to bring up this subtlety..

If by "valid but too slow" you mean candidates that takes to validate somewhere in between 2s and 12s, we currently accept them. Backers who fail to validate a candidate won't initiate a dispute.

ordian avatar Sep 08 '22 13:09 ordian

If by "valid but too slow" you mean candidates that takes to validate somewhere in between 2s and 12s, we currently accept them. Backers who fail to validate a candidate won't initiate a dispute.

Yes exactly. I'd think backers can simply give up after 2s without ever initiating disputes.

I asked more about approval checkers, who presumably run the full 12s but what do they do if it takes more than 2s?

burdges avatar Sep 08 '22 14:09 burdges

I asked more about approval checkers, who presumably run the full 12s but what do they do if it takes more than 2s?

Yes, the approval checkers approve every candidate that validates under 12s (after #5951, so after most validators upgrade, 6s as of now).

ordian avatar Sep 08 '22 14:09 ordian

Cool thanks. We can discuss "valid by slow" messages by approval checkers if https://github.com/paritytech/polkadot/pull/5951 does not suffice for the current problems, but no point worrying just yet. :)

burdges avatar Sep 08 '22 14:09 burdges

bot merge

ordian avatar Sep 20 '22 08:09 ordian

Waiting for commit status.

Merge cancelled due to error. Error: Statuses failed for b453b1af5abb6504e0f8eb95fbbf549e8a9f6418