substrate icon indicating copy to clipboard operation
substrate copied to clipboard

make approval chilling upon slash explicit

Open kianenigma opened this issue 1 year ago • 1 comments

related to https://github.com/paritytech/substrate/issues/6835 and https://github.com/paritytech/polkadot/pull/5974 and https://github.com/paritytech/polkadot/pull/5978

Context

The objective is to achieve this statement of the slashing spec:

We first post a slashing transaction to the chain, which drops the offending validator from the active validator list by invalidating their controller key, or maybe just their session keys. In consequence, all nodes ignore for the remainder of the era. It invalidates any future blocks that do not ignore too. We also remove all nomination approval votes by any nominator for , even those who currently allocate zero stake.

Historically, @rphmeier implemented this by reading the slashing spans upon the staking election. Over the years, the staking system has grown and the number of nominators has grown a lot, ergo the election process has become more weight/memory intense.

I don't see any reason for to think that the original approach had a particular advantage, but I might be missing something.

Moreover, the SlashingSpans of a staker are only removed ONLY when the staker is fully removed. This means the size of this storage item is potentially very large (see below for ) and it is being read every era upon staking election.

This changes the logic such that this chilling happens upon slash, and it is explicit, meaning that we actually remove the nomination. We still record SlashingSpans::last_nonzoro_slash and Nominations::submitted_in, which are no longer used onchain, but a UI can still use it to speculate if/when you need to re-send your nominations to a slashed validator (cc @jacogr @rossbulat).

This will basically make the slashing and npos election part decoupled, and reduces our worries about memory issues when creating the snapshot.

Regarding https://github.com/paritytech/substrate/issues/6835, now it is pretty easy to move this chilling from compute_slash (when slash is reported) to do_slash (when slash is applied), so that validators who get slashed and are alter forgiven via governance don't lose nominations.

Review notes

The diff is bit noisy because I fixed some comments according to https://github.com/paritytech/substrate/discussions/12404 and a few other random things. The main diff is only in slashing.rs and pallet/impls.rs

I also added an event for when the slash is reported. Might help UIs. cc @wpank

Current slashing spans

// polkadot: 
Len = 455 / encoded_size = 22537
// kusama
Len = 2477 / encoded_size = 124903

As you see, we also have a problem of storing too much slashing spans, which I will formulate in a separate issue.

#12409 Would come very handy here.

kianenigma avatar Oct 04 '22 14:10 kianenigma

We should likely remove the slashing spans after the unbonding period, perhaps by making stakers remove them whenever they touch their account, like unbonding, etc. We could make their controller key similarly remove the old spans.

We (Rob & I) designed the slashing spans system to make slashing more fair. We never implemented the whole scheme We threw this out the window for 100% soundness slashes, if not before then once I noticed that gambler's ruin gives us an actual security proof for the whole system. We also now know how to make it much harder to mistakenly triggering these non-soundness slashes. We could therefore discuss more radical simplifications if anyone dares. ;)

burdges avatar Oct 04 '22 14:10 burdges

@ordian @rphmeier have either of you had the time to look at this?

kianenigma avatar Oct 19 '22 20:10 kianenigma

I'll claim naivety/idealism on that one. We really did not know all the practical concerns when designing this.

We ideally do want to chill nominators like this eventually, but only after considerable damage. As discussed, we'll presumably do slashes immediately but accrue "slashed balance" in nominators' accounts, and then disable them when this grows above 5% or something.

We should choose if the slashing spans computation stays, which provides similar defenses for retrospective slashing, or long range attacks. I'm think retrospective slashing is either a strange bug or governance or a real long range attack. We trust governance to be fair in the first two, and the third is pretty bad, so I'm tempted to say fuck it and forget about the spans..

burdges avatar Nov 01 '22 16:11 burdges

The decision with @burdges and @ordian and @eskimor was to remove this chilling for the time being.

kianenigma avatar Nov 06 '22 16:11 kianenigma

IIUC, not chilling nominators at all on slashes is be a big user-facing change that needs to be communicated.

Totally agree, I will post about this in the polkadot forum once it is finalized.

kianenigma avatar Nov 08 '22 11:11 kianenigma

Added a few comments in the tests. If i'm understanding the flow correctly there are still a few assertions that expect the nominators to be chilled right after the offence.

I'm slightly concerned that validators with applied slashes (not reverted) will be able to start validating with the same supports when the chilling period ends. Perhaps we could drop the supports for slashed validators in fn apply_slash? The list of nominators that supported the validator are passed in the struct UnappliedSlash, so we could mutate the supports of the slashed nominators there.

I see a few advantages of removing the supports as the slashed is applied:

  • Bad actors will have their support dropped;
  • Increases even further the punishment for bad actors;
  • Avoids the issue with validators being slashed in a loop, which is a valid concern IMO.

And the costs/complexity of implementing it in fn apply_slash seem low.

This was basically the state of the PR as of a few commits ago. Given the timeliness I would suggest we move forward with the current state, and re-introduce any kind of chilling later, in a more planned manner.

To recall, the reason that we thought chilling upon apply_slash is not all that useful is that between a slash happens and when it is applied, the "bad validator" have plenty of time to continue doing bad things. Moreover, if they are actually "bad", all of this is pointless since they will immediately re-nominate. So all in all, chilling nominators for the sake of chain security is a lost cause. They will re-nominate. But, recall that they will get slashed, so their "effectiveness" declines.

Chilling nominators is mostly to protect nominators, which we are dropping now entirely. Nominators are an active role and should manually react to slashes. The new event I added in this PR should help with that.

kianenigma avatar Dec 10 '22 14:12 kianenigma

bot merge

kianenigma avatar Dec 12 '22 11:12 kianenigma

Error: Statuses failed for 99396f3677ec40686e1c5445296c06a47880cd1e

bot merge

kianenigma avatar Dec 12 '22 12:12 kianenigma

Waiting for commit status.

Merge cancelled due to error. Error: Statuses failed for df93a9480dca85a2453bb5e34c0964981e747943

/cmd queue -c bench-bot $ pallet dev pallet_staking

kianenigma avatar Dec 12 '22 13:12 kianenigma

/cmd queue -v RUST_LOG=debug -c bench-bot $ pallet pallet_staking

kianenigma avatar Dec 12 '22 15:12 kianenigma

@kianenigma https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/2147733 was started for your command "$PIPELINE_SCRIPTS_DIR/bench-bot.sh" pallet pallet_staking. Check out https://gitlab.parity.io/parity/mirrors/substrate/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment /cmd cancel 54-32e5b5f0-5f94-4173-829f-21ef4ac1306b to cancel this command or /cmd cancel to cancel all commands in this pull request.

command-bot[bot] avatar Dec 12 '22 15:12 command-bot[bot]

@kianenigma Command "$PIPELINE_SCRIPTS_DIR/bench-bot.sh" pallet pallet_staking has finished. Result: https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/2147733 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/2147733/artifacts/download.

command-bot[bot] avatar Dec 12 '22 15:12 command-bot[bot]

/cmd queue -c bench-bot $ pallet pallet_staking

kianenigma avatar Dec 12 '22 15:12 kianenigma

@kianenigma https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/2147802 was started for your command "$PIPELINE_SCRIPTS_DIR/bench-bot.sh" pallet pallet_staking. Check out https://gitlab.parity.io/parity/mirrors/substrate/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment /cmd cancel 55-ccf2e6bb-8481-4c11-a6d6-f260c6526ca0 to cancel this command or /cmd cancel to cancel all commands in this pull request.

command-bot[bot] avatar Dec 12 '22 15:12 command-bot[bot]

@kianenigma Command "$PIPELINE_SCRIPTS_DIR/bench-bot.sh" pallet pallet_staking has finished. Result: https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/2147802 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/2147802/artifacts/download.

command-bot[bot] avatar Dec 12 '22 15:12 command-bot[bot]

/cmd queue -c bench-bot $ pallet dev pallet_staking

kianenigma avatar Dec 12 '22 15:12 kianenigma

@kianenigma https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/2147939 was started for your command "$PIPELINE_SCRIPTS_DIR/bench-bot.sh" pallet dev pallet_staking. Check out https://gitlab.parity.io/parity/mirrors/substrate/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment /cmd cancel 56-353cc009-9d45-4c95-9e42-8e86090d03d2 to cancel this command or /cmd cancel to cancel all commands in this pull request.

command-bot[bot] avatar Dec 12 '22 15:12 command-bot[bot]

@kianenigma Command "$PIPELINE_SCRIPTS_DIR/bench-bot.sh" pallet dev pallet_staking has finished. Result: https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/2147939 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/2147939/artifacts/download.

command-bot[bot] avatar Dec 12 '22 16:12 command-bot[bot]

bot merge

kianenigma avatar Dec 12 '22 16:12 kianenigma

Error: "Check reviews" status is not passing for https://github.com/paritytech/polkadot/pull/6424

bot merge

kianenigma avatar Dec 12 '22 17:12 kianenigma

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/kusama-era-4543-slashing/1410/1

Polkadot-Forum avatar Dec 12 '22 17:12 Polkadot-Forum

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/kusama-era-4543-slashing/1410/2

Polkadot-Forum avatar Dec 12 '22 20:12 Polkadot-Forum

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/polkadot-release-analysis-v0-9-35/1509/1

Polkadot-Forum avatar Dec 20 '22 13:12 Polkadot-Forum