nimbus-eth1
nimbus-eth1 copied to clipboard
Huge FC.deserialize ( 2+ hours ) time for long non-finality / LC sync
Issue came up on mainnet sync
Situation can be explained as syncer was working on something like
B . . . . . . . . F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H
^
|
we are here
Now once the syncer imported till F, it is fine working great. As we crossed the F point, we keep importing. Then I stopped the node ( a user can, cuz FC.serialize is there )
The progress is serialized, and dumped, for loading it when we start it again
But when we start the node, it just keeps loading the data for more than 2hrs for example this run, which I kept running for 20 minutes before forcing it to kill manually
./build/nimbus_execution_client --network=mainnet --engine-api --data-dir=/Volumes/ETH/ethereum/mainnet/20241204-f034af4 --jwt-secret=/Volumes/ETH/ethereum/mainnet/20241204-f034af4/jwt.hex --log-level=DEBUG
libbacktrace error: no debugging symbols available. Compile with '--debugger:native'.
INF 2025-04-29 03:07:40.107+05:30 Launching execution client version=v0.1.0-8ba70652 conf="(dataDir: /Volumes/ETH/ethereum/mainnet/20241204-f034af4, era1DirOpt: none(OutDir), eraDirOpt: none(OutDir), keyStore: /Volumes/ETH/ethereum/mainnet/20241204-f034af4/keystore, importKey: , trustedSetupFile: none(string), extraData: \"nimbus-eth1/v0.1.0-8ba70652\", gasLimit: 36000000, network: \"mainnet\", customNetwork: none(NetworkParams), networkId: 1, networkParams: (config: ..., genesis: ...), logLevel: \"DEBUG\", logStdout: auto, logMetricsEnabled: false, logMetricsInterval: 10, metricsEnabled: false, metricsPort: 9093, metricsAddress: 127.0.0.1, bootstrapNodes: @[], bootstrapFile: , bootstrapEnrs: @[], staticPeers: @[], staticPeersFile: , staticPeersEnrs: @[], reconnectMaxRetry: 0, reconnectInterval: 15, listenAddress: ::, tcpPort: 30303, udpPort: 30303, maxPeers: 25, nat: (hasExtIp: false, nat: NatAny), discovery: V4, netKey: \"random\", agentString: \"nimbus-eth1/v0.1.0-8ba70652/macosx-arm64/Nim-2.2.4\", numThreads: 0, beaconSyncTargetFile: none(InputFile), beaconSyncBlocksQueueHwm: 0, rocksdbMaxOpenFiles: 2048, rocksdbWriteBufferSize: 67108864, rocksdbRowCacheSize: 0, rocksdbBlockCacheSize: 2147483648, rdbVtxCacheSize: 536870912, rdbKeyCacheSize: 1342177280, rdbBranchCacheSize: 1073741824, rdbPrintStats: false, rewriteDatadirId: false, cmd: noCommand, httpPort: 8545, httpAddress: 127.0.0.1, rpcEnabled: false, rpcApi: @[], wsEnabled: false, wsApi: @[], historyExpiry: false, historyExpiryLimit: none(GeneralizedIndex), portalUrl: \"\", engineApiEnabled: true, engineApiPort: 8551, engineApiAddress: 127.0.0.1, engineApiWsEnabled: false, allowedOrigins: @[], jwtSecret: some(/Volumes/ETH/ethereum/mainnet/20241204-f034af4/jwt.hex))"
INF 2025-04-29 03:07:40.193+05:30 Threadpool started numThreads=14
DBG 2025-04-29 03:07:40.260+05:30 Reverting to base err="no finalized block hash and number"
INF 2025-04-29 03:07:40.261+05:30 Database initialized base="(0xc01426b43f56e578a93c3ac7456d5647cda169ae0050fb6dd20c2426c7832a24, 21778642)" finalized="(0xc01426b43f56e578a93c3ac7456d5647cda169ae0050fb6dd20c2426c7832a24, 21778642)" head="(0xc01426b43f56e578a93c3ac7456d5647cda169ae0050fb6dd20c2426c7832a24, 21778642)"
^CNTC 2025-04-29 03:08:45.762+05:30
Ctrl+C pressed. Waiting for a graceful shutdown.
^CNTC 2025-04-29 03:27:33.078+05:30
Ctrl+C pressed. Waiting for a graceful shutdown.
I don't have a exact logs for the 2+hrs run, as it was running in TRACE mode, and being flooded with vm computation messages, my system kept pruning the logs
As per discussion with @jangko over discord, this might be a issue with aristo snapshot
A smaller around 1hr run for smaller FC store
INF 2025-04-29 10:47:17.653+05:30 Threadpool started numThreads=14
INF 2025-04-29 10:47:17.722+05:30 Database initialized base="(0xc01426b43f56e578a93c3ac7456d5647cda169ae0050fb6dd20c2426c7832a24, 21778642)" finalized="(0xc01426b43f56e578a93c3ac7456d5647cda169ae0050fb6dd20c2426c7832a24, 21778642)" head="(0xc01426b43f56e578a93c3ac7456d5647cda169ae0050fb6dd20c2426c7832a24, 21778642)"
^CNTC 2025-04-29 10:55:04.452+05:30
Ctrl+C pressed. Waiting for a graceful shutdown.
INF 2025-04-29 11:50:13.184+05:30 UPnP: added port mapping topics="eth net nat" externalPort=30303 internalPort=30303 protocol=TCP
INF 2025-04-29 11:50:13.200+05:30 UPnP: added port mapping topics="eth net nat" externalPort=30303 internalPort=30303 protocol=UDP
INF 2025-04-29 11:50:13.205+05:30 RLPx listener up topics="eth p2p" self=enode://e668e1e8d3493fe6044c408d4b7accfd85ef06baccdc9cc2c6465dd4e92c93cdfa3bc000f44665689df74191e03f781f8719a5df8f08222a87737249badd8d4f@172.21.179.35:30303
INF 2025-04-29 11:50:13.206+05:30 JWT secret loaded topics="Jwt/HS256 auth" jwtSecretPath=/Volumes/ETH/ethereum/mainnet/20241204-f034af4/jwt.hex
INF 2025-04-29 11:50:13.206+05:30 Engine API enabled url=http://127.0.0.1:8551
Started -> 10:47:17 Ended -> 11:50:13 Taken -> 1hr
Indeed, fixing this will require persisting the aristo txframe snapshot to disk.. however, it's certainly unusual that F is not updating for this long, even in LC mode.
As we crossed the F point, we keep importing
That is another problem described in #3207, when CL sees EL sync faster than CL, it will not update the F. Still need to solve that too.