Unable to load a 1GB genesis file in 40 seconds in version 1.28.0.
Description Our custom network uses a large 1GB genesis.json file, and it worked fine with versions before 1.28.0, such as 1.27.x.
However, after upgrading to version 1.28.0, my Nethermind node can't start and encountered this error:
26 Aug 02:44:18 | Snap serving enabled, but PruningBoundary is less than 128. Setting to 128.
26 Aug 02:45:39 | Step LoadGenesisBlock failed after 80976ms System.TimeoutException: Genesis block was not processed after 40 seconds
at Nethermind.Init.Steps.LoadGenesisBlock.Load(IWorldState worldState) in /src/Nethermind/Nethermind.Init/Steps/LoadGenesisBlock.cs:line 88
at Nethermind.Init.Steps.LoadGenesisBlock.Execute(CancellationToken _) in /src/Nethermind/Nethermind.Init/Steps/LoadGenesisBlock.cs:line 46
at Nethermind.Init.Steps.EthereumStepsManager.ExecuteStep(IStep step, StepInfo stepInfo, CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Init/Steps/EthereumStepsManager.cs:line 153
at Nethermind.Init.Steps.EthereumStepsManager.InitializeAll(CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Init/Steps/EthereumStepsManager.cs:line 95
at Nethermind.Runner.Ethereum.EthereumRunner.Start(CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Runner/Ethereum/EthereumRunner.cs:line 36
at Nethermind.Runner.Program.<>c__DisplayClass8_0.<<Run>b__1>d.MoveNext() in /src/Nethermind/Nethermind.Runner/Program.cs:line 213
26 Aug 02:45:39 | Error during ethereum runner start System.TimeoutException: Genesis block was not processed after 40 seconds
at Nethermind.Init.Steps.LoadGenesisBlock.Load(IWorldState worldState) in /src/Nethermind/Nethermind.Init/Steps/LoadGenesisBlock.cs:line 88
at Nethermind.Init.Steps.LoadGenesisBlock.Execute(CancellationToken _) in /src/Nethermind/Nethermind.Init/Steps/LoadGenesisBlock.cs:line 46
at Nethermind.Init.Steps.EthereumStepsManager.ExecuteStep(IStep step, StepInfo stepInfo, CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Init/Steps/EthereumStepsManager.cs:line 153
at Nethermind.Init.Steps.EthereumStepsManager.InitializeAll(CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Init/Steps/EthereumStepsManager.cs:line 95
at Nethermind.Runner.Ethereum.EthereumRunner.Start(CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Runner/Ethereum/EthereumRunner.cs:line 36
at Nethermind.Runner.Program.<>c__DisplayClass8_0.<<Run>b__1>d.MoveNext() in /src/Nethermind/Nethermind.Runner/Program.cs:line 213
Steps to Reproduce
- Generate a large genesis file of 1GB.
- Use this large genesis file to initialize and start the node.
Actual behavior The node can't start and logs a timeout of 40 seconds.
By the way, why is the 40s timeout hardcoded? https://github.com/NethermindEth/nethermind/blob/e856de5a33259ea0e54c40c28db37631bf56c2c0/src/Nethermind/Nethermind.Init/Steps/LoadGenesisBlock.cs#L23
Expected behavior The node can start normally, just like in version 1.27.x.
Screenshots
Desktop (please complete the following information): Please provide the following information regarding your setup:
- Operating System: Linux
- Version: 1.28.0
- Installation Method: Docker
- Consensus Client: none
Additional context In my more precise testing, if the genesis file size exceeds 256MB, the node fails to start and times out while loading the genesis file.
My startup paramaters:
version: "3.9"
services:
execution:
tty: true
environment:
- TERM=xterm-256color
- COLORTERM=truecolor
stop_grace_period: 30s
container_name: gas-execution-client
image: ${EC_IMAGE_VERSION}
networks:
- gas
volumes:
- ${EC_DATA_DIR}:/nethermind/data
- ${EC_JWT_SECRET_PATH}:/tmp/jwt/jwtsecret
- ${CHAINSPEC_PATH}:/tmp/chainspec/chainspec.json
ports:
- "30304:30304/tcp"
- "30304:30304/udp"
- "8009:8009"
- "8545:8545"
- "8551:8551"
expose:
- 8545
- 8551
command:
- --config=none.cfg
- --Init.ChainSpecPath=/tmp/chainspec/chainspec.json
- --datadir=/nethermind/data
- --log=INFO
- --JsonRpc.Enabled=true
- --JsonRpc.Host=0.0.0.0
- --JsonRpc.Port=8545
- --JsonRpc.JwtSecretFile=/tmp/jwt/jwtsecret
- --JsonRpc.EngineHost=0.0.0.0
- --JsonRpc.EnginePort=8551
- --Network.DiscoveryPort=30304
- --HealthChecks.Enabled=true
- --Metrics.Enabled=true
- --Metrics.ExposePort=8009
- --Sync.MaxAttemptsToUpdatePivot=0
logging:
driver: json-file
options:
max-size: 10m
max-file: "10"
networks:
gas:
name: gas-network
Logs
Can you share genesis file you are using?
Can you share genesis file you are using?
In my test environment, I generate a random genesis file every time using this script, which creates many accounts in one genesis file.
For a quick test, this is a larger than 800MB genesis file of Endurance's mainnet. You could also try using this file: (But I haven't tried this file to see if it will produce the error. My error comes from the script method mentioned above.) https://github.com/OpenFusionist/network_config
hi @LukaszRozmej For the above mentioned performance regression, I've done a further investigation and have some conclusions and points I'd like to further discuss
Regarding the performance issue: PR: https://github.com/NethermindEth/nethermind/pull/7215 was a performance optimization that replaced LruCache with ClockCache to reduce lock granularity. However, due to implementation details, it caused a regression that led to timeout issues when initializing large genesis files (>800M). The latest commit (60159fb448d5b7fd53565aa7b15942a8c68614ba) appears to have fixed this issue based on our tests.
Issue identification method:
- Compared commits between two releases (1.27.1...1.28.0)
- Locally compiled Nethermind and attempted to start it with a large genesis file
- Used git bisect to gradually locate the problematic commit
Regarding the 40s hard-coded timeout: this has been previously discussed.Related PR: https://github.com/NethermindEth/nethermind/pull/6160. We can further discuss this issue:
It's up for discussion if we want to increase the timeout from 40 seconds (current default, hard-coded value) to something different.
Let me know if you need any additional information or clarification on this matter.
@ohko4711 thank you for the analysis. #7215 might have some unplanned effect though https://github.com/NethermindEth/nethermind/commit/60159fb448d5b7fd53565aa7b15942a8c68614ba shouldn't affect genesis based on the code, so not sure if it was this that could fix it. @benaadams can you check, both are your changes.
I will move the timeout to config though.
@LukaszRozmej Anything more planned for this issue?
We could explore some additional optimizations that would not deserialize genesis file to an object, but it is very low priority now, so I think we can close.