polkadot
polkadot copied to clipboard
PVF Validation checks for slashing
- [ ] Is execution timeout actually taken into account in all the right places?
- [x] Do we have metrics on validation execution time?
- [ ] Double check that this metric is measuring the right thing (the time that if exceeded will cause validation to fail).
- [ ] Monitor that metric on our validators over a long period of time (weeks) and see how much it fluctuates on a single validator/ across our validators. - It is important to not even that out, so we are interested in maximums here.
- [ ] Check that we have precise logging on the actual cause of a validation error.
- [ ] Check that the time a validation took is logged.
- [ ] Examine those logs (might reveal something that gets lost in metrics due to averaging)
- [ ] Get those logs from validators that are being slashed.
- [ ] Add an alert if validation time gets anywhere close to the timeout in approval checking.
What logs should validators have enabled, is WARN/INFO enough to get useful info?
Parity Kusama validators barely ever go above 2s, only once (still below 3) within the last two weeks:
https://grafana.parity-mgmt.parity.io/goto/F3xTfeZ4k?orgId=1

Thanks @ordian!
What logs should validators have enabled, is WARN/INFO enough to get useful info?
parachain::candidate-validation=debug would be useful.
It might be a bit early to have definite results, but it does not look like the approval voting timeout doubling did not entirely fix the problem:

Which is expected, as we know of at least two other reasons for disputes by now:
#6041 #6057