nearcore icon indicating copy to clipboard operation
nearcore copied to clipboard

[ProjectTracking] Chunk validator rewards

Open Longarithm opened this issue 1 year ago • 4 comments

Goal

Make chunk validator kickouts&rewards fairly depend on the number of endorsements they actually created, not the fact of chunk inclusion

References

Design Doc

Roadmap

With approximate timelines

  • [ ] Implement EpochConfigStore
    • [x] Extract EpochConfig to EpochConfigStore from GenesisConfig (#11896 and so on). We are going to change EpochConfig reward/kickout thresholds and adding patches to genesis for that has proven to be very annoying. (1w)
    • [ ] Remove the hard-coded setting of the EpochConfig and use EpochConfigStore only.
  • [ ] Implement and test Easy mode for new protocol version with low thresholds (2w, also depends on experience)
    • [x] Add chunk endorsements bitmap to BlockHeaderInfo and BlockInfo. #11940.
    • [x] Remove BlockHeaderInfo as simplification. #11971
    • [x] Propagate chunk endorsements bitmap to add_validator_proposals and calculate rewards and kickoffs using it. #11940
    • [x] For block downloaders, validate chunk endorsement bitmap in BlockHeader with BlockBodyV3::chunk_endorsements in Chain::validate_block_impl.
    • [x] Add validation to check that the bits in the position larger than number of endorsements are all set to 0.
    • [x] Update code that calculate exemptions (compute_exempted_kickout). #11982
    • [x] Adjust the reward rate to re-map chunk validation online ratio from 0 to 1. @tayfunelmas
    • [x] Fix the nondeterminism issue discussed in this thread.
  • [ ] Validate the fix in the environment with multiple clients. Consider:
    • [x] Add command to replay block headers from mainnet, run rewards/kickouts calculation logic, and print the differences to validate the fix.
    • [x] Add Nayduck test for kicking out offline validators.
    • [ ] TestLoop with simulated delays in chunk application. Much faster and flexible, but never used for simulations before. Testnet may not help because it doesn’t have a heavy load. 2w. Better to start preparing in advance
    • [x] Start using EpochConfigStore in TestLoop to bypass the hard-coded epoch config configuration. @Longarithm
  • [ ] If the fix doesn’t work, e.g. results in too many kickouts of honest CVs:
    • [ ] Try different ideas, e.g. make BP wait until 90% of chunk endorsements are received. But then one needs to determine the exact percentage and the impact on whether this impacts block production time and TPS. Also BP can specifically wait for small CVs, however, the exact logic is unclear.
    • [ ] Iterate on these ideas as well. 2w
  • [ ] If simple fixes don’t work, implement Hard mode and iterate on it as well (3w)

Longarithm avatar Aug 07 '24 11:08 Longarithm

Notes after meeting today:

  • Tayfun - to look into e2e impl, primarily find what is needed to write tool to analyse new algo on mainnet endorsement data
  • Alex - to look into needed TestLoop improvements to test new algo + simulate kickouts with synthetic delays

Longarithm avatar Aug 07 '24 12:08 Longarithm

I did some simulation of mainnet epochs (last 5 epochs) using the easy mode algorithm. Did not change the kickout threshold (80%) or rewards rate. Results are in this document, where diffs are between the original run of the network and the simulated run.

tayfunelmas avatar Aug 22 '24 14:08 tayfunelmas

Aug 30th report

  • Removed BlockHeaderInfo to simplify the future changes (#11971).
  • Fixed the issue that the logic for calculating exempted validators for kickout does not consider chunk endorsement rate (almost never kicks out them). (#11982). Note that this will be packages into the same protocol feature as other changes in this category.
  • Started implementing the part where we add chunk endorsement bitmap to the BlockHeader (previously added to BlockInfo) Mostly test fixes left. (#12024).
  • Identified a plan to recover the diff in validator rewards if we start use chunk endorsement ratio (instead of chunk productio ratio). Will implement this next and identify the optimal values for min/max online ratios and kickout ratios.
    • Detailed proposal and discussion can be found here

walnut-the-cat avatar Sep 03 '24 13:09 walnut-the-cat

Sept 2-6

  • Discussions on how to make chunk endorsement ratio to contribute to rewards, ending up with a simple algorithms to use a cutoff threshold.
  • Implemented changes for various parts of chunk validator rewards, including adding bitmap to BlockHeaderV5 (#12024), unblock moving feature to nightly (#12043), introducing new endorsement ratio cutoff threshold (#12047), and update tools to run experiments (#12048).
  • Experiments on mainnet historical data with the overall algorithm using received chunk endorsements ratio for deciding on kickout and rewards.
  • Somewhat complex implementation of min/max ratios for endorsement, experimented but gave up after discussions (#12034).

Sept 9-13

  • Moved the validator rewards feature to Nightly by fixing tests (#12065). Fix broken Nayduck tests due to moving feature to Nightly (#12077, #12087).
  • Identified an issue with sorting chunk validators with same uptime ratio and prepared a change to alleviate the problem (#12092).
  • Prepared PR to stabilize the feature for production (#12089).

walnut-the-cat avatar Sep 16 '24 14:09 walnut-the-cat

Released in 2.3.0 to testnet and mainnet.

tayfunelmas avatar Nov 12 '24 14:11 tayfunelmas