nearcore
nearcore copied to clipboard
Feature request of Unsafe fast start of nearcore
After the recent incident with Indexer for Explorer, we've started the ["Improvement uptime" discussion](https://github.com/near/near-indexer-for-explorer/discussions/161)
One of the important stuff for an indexer is to be able to start very quickly even if it is not safe to do so, but very required.
We want to make sure we can quickly restart indexer-for-explorer even on testnet (where the genesis file is huge), maybe at a cost of adding --UNSAFE-fast-start
flag which would disable genesis validation in nearcore somehow (I believe that is the main contributor to the start time)
/cc @frol
@janewang @bowenwang1996 I would like to ask Node Experience team to consider implementing this feature.
Currently, Indexer Framework uses nearcore::config::load_config_without_genesis_records
, but even then it takes tens of minutes to boot on testnet
https://github.com/near/nearcore/blob/b84b3fcdca7e3c221fa4adee35efeac6c2170637/chain/indexer/src/lib.rs#L104-L105
I feel we might want the same feature for neard
. What do you think?
I feel we might want the same feature for neard. What do you think?
@frol how urgent is this request? @posvyatokum could you take a look?
This feature is not urgent at the moment, but it would help a lot in a face of incidents that may happen any day
Did some experiments with commenting out genesis validation on an archival testnet node. Disabling validation reduces the time between running neard
and it getting to opening the DB by 1 minute.
Without validation: 59 sec and 47 sec
Nov 29 19:23:43.235 INFO neard: Version: 1.22.0, Build: 5a6fb2bd2-modified, Latest Protocol: 48
Nov 29 19:24:42.306 WARN near_chain_configs::genesis_validate: Skipping validation of genesis
Nov 29 19:24:42.306 INFO near: Opening store database at "/home/ubuntu/.near/data"
Nov 29 19:31:51.720 INFO neard: Version: 1.22.0, Build: 5a6fb2bd2-modified, Latest Protocol: 48
Nov 29 19:32:38.803 WARN near_chain_configs::genesis_validate: Skipping validation of genesis
Nov 29 19:32:38.803 INFO near: Opening store database at "/home/ubuntu/.near/data"
Nov 29 19:34:33.722 INFO stats: Server listening at
With validation: 1m51s and 1m51s
Nov 29 19:25:04.670 INFO neard: Version: 1.22.0-rc.4, Build: 25b000ae4, Latest Protocol: 48
Nov 29 19:26:55.681 INFO near: Opening store database at "/home/ubuntu/.near/data"
Nov 29 19:28:50.131 INFO stats: Server listening at
Nov 29 19:34:39.732 INFO neard: Version: 1.22.0-rc.4, Build: 25b000ae4, Latest Protocol: 48
Nov 29 19:36:30.826 INFO near: Opening store database at "/home/ubuntu/.near/data"
Nov 29 19:38:24.992 INFO stats: Server listening at
@nikurt Hmm, that is odd! Mainnet node starts in 4 seconds:
Nov 24 08:33:20.509 INFO indexer_for_explorer: NEAR Indexer for Explorer v0.10.3 starting...
Nov 24 08:33:20.511 INFO indexer_for_explorer: construct_near_indexer_config
Nov 24 08:33:20.511 INFO indexer: Load config from /home/ubuntu/.near...
Nov 24 08:33:24.982 INFO indexer_for_explorer: Stream has started
Nov 24 08:33:24.982 INFO indexer: Starting Streamer...
Nov 24 08:33:24.985 INFO stats: Server listening at ed25519:[email protected]:24567
I also remember my older experiments with quite ad-hoc switching the validation off and getting "instant" boot on testnet even in debug mode. I cannot take a minute to open the DB :thinking:
@frol Tried with a non-archivial node, and opening the DB takes 4 minutes.
ubuntu@nikurt-3:~$ ./neard run
Nov 30 13:07:10.501 INFO neard: Version: 1.23.0-rc.1, Build: crates-0.10.0-70-g93e8521c9, Latest Protocol: 49
Nov 30 13:08:58.264 INFO near: Opening store database at "/home/ubuntu/.near/data"
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 98, kind: AddrInUse, message: "Address already in use" }', chain/jsonrpc/src/lib.rs:1374:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Nov 30 13:12:48.497 INFO stats: Server listening at ed25519:[email protected]:24567
ubuntu@nikurt-3:~$ ./neard.no_validation run
Nov 30 13:13:10.540 INFO neard: Version: trunk, Build: crates-0.10.0-84-g5011d288c-modified, Latest Protocol: 49
Nov 30 13:13:58.887 INFO near: Opening store database at "/home/ubuntu/.near/data"
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 98, kind: AddrInUse, message: "Address already in use" }', chain/jsonrpc/src/lib.rs:1374:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Nov 30 13:17:50.874 INFO stats: Server listening at ed25519:[email protected]:24567
This is counter-intuitive to me. Is it debug or release build you test there?
I'm experimenting with binaries built by make neard
@nikurt we probably should avoid loading the genesis file itself in unsafe start given that testnet genesis is quite large. Also, we may even want to make this the default behavior if it is not the first time the node starts since it is very rare that genesis changes. If that actually happens, it seems fine to me to require manual intervention.
About 50 seconds are spent reading genesis here: https://github.com/near/nearcore/blob/29a8ae2e7a745d2d2d48c1b28f626fd91671064a/core/chain-configs/src/genesis_config.rs#L303
Also, why do we even need to read genesis twice? :)
Also, why do we even need to read genesis twice? :)
I think we always compute the genesis hash to verify that it has not changed, but you are right -- there is no good reason to do that. The assumption should be that genesis does not change instead of the other way around.
This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
@nikurt Was it completely resolved in #5888?
@frol No, this needs more investigation.
George Milescu commented:
Closing this issue in favour of https://pagodaplatform.atlassian.net/browse/ND-231 since both of them target the same improvements.
George Milescu commented:
Reopening this issue to offer external visibility over the progress we are making.