nearcore icon indicating copy to clipboard operation
nearcore copied to clipboard

Generalized Workload Change

Open null-ref-ex opened this issue 2 years ago • 4 comments

Describe the bug Since the release of 1.28.0 more validators in the set seem to experience < 100% uptime on a daily basis. For our node in particular we have seen great disk READ activity which seems to peak during the periods when we drop blocks: image

To Reproduce Run a validator on main-net

Expected behavior Broader 100% uptime for validators as per 1.27.0

Screenshots image

Version (please complete the following information):

  • 1.28.0
  • mainnet

Additional context It's anecdotal but check here: https://near-overview.genesislab.net/

6 of the top 10 are < 100%. This wasn't the case pre 1.28.0 on a regular basis.

null-ref-ex avatar Aug 02 '22 19:08 null-ref-ex

Do you also see just blocks missed with all chunks being produced at 100%? We have not seen an increase in reads compared to 30 days before that.

Mic92 avatar Aug 04 '22 09:08 Mic92

No, the misses appear to have been both blocks and chunks in parallel. image

The read activity uptick seems to be continuing but generally the validator uptime across the set seems more stable than it was but less stable than before the current release. Still regularly see top 10 missing 100% which was rare before.

null-ref-ex avatar Aug 04 '22 16:08 null-ref-ex

@matklad @mm-near @mzhangmzz could this be related to the client actor refactoring (applying chunks in different threads) that is introduced in 1.28.0?

bowenwang1996 avatar Aug 04 '22 22:08 bowenwang1996

No, that code isn't part of 1.28.0

The code in question is here:

https://github.com/near/nearcore/blob/f7289f939584652d8c9c246898a1fa89bb61b92d/chain/chain/src/chain.rs#L1966-L1976

It is not in 1.28:

https://github.com/near/nearcore/blob/1.28.0/chain/chain/src/chain.rs

matklad avatar Aug 05 '22 11:08 matklad