optimism icon indicating copy to clipboard operation
optimism copied to clipboard

Getting "nonce too low" after restarting l2geth

Open sanbotto opened this issue 2 years ago • 8 comments

Describe the bug l2geth breaks after restarting it. l2geth is working perfectly fine, I restart it and then this shows up in the logs:

INFO [12-15|16:59:29.673] Syncing transaction batch range          start=43668 end=48186
ERROR[12-15|16:59:29.681] Mismatched transaction                   index=5769274
ERROR[12-15|16:59:29.681] Mismatched transaction                   index=5769275
ERROR[12-15|16:59:29.681] Mismatched transaction                   index=5769276
ERROR[12-15|16:59:29.681] Mismatched transaction                   index=5769277
ERROR[12-15|16:59:29.681] Mismatched transaction                   index=5769278
ERROR[12-15|16:59:29.681] Mismatched transaction                   index=5769279
ERROR[12-15|16:59:29.681] Mismatched transaction                   index=5769280
ERROR[12-15|16:59:29.681] Mismatched transaction                   index=5769281
ERROR[12-15|16:59:29.681] Mismatched transaction                   index=5769282
ERROR[12-15|16:59:29.682] Mismatched transaction                   index=5769283
ERROR[12-15|16:59:29.682] Mismatched transaction                   index=5769284
ERROR[12-15|16:59:29.682] Mismatched transaction                   index=5769285
ERROR[12-15|16:59:29.682] Mismatched transaction                   index=5769286
ERROR[12-15|16:59:29.682] Mismatched transaction                   index=5769287
ERROR[12-15|16:59:29.682] Mismatched transaction                   index=5769288
ERROR[12-15|16:59:29.682] Mismatched transaction                   index=5769289
ERROR[12-15|16:59:29.682] Mismatched transaction                   index=5769290
ERROR[12-15|16:59:29.682] Mismatched transaction                   index=5769291
ERROR[12-15|16:59:29.682] Mismatched transaction                   index=5769292
ERROR[12-15|16:59:29.682] Mismatched transaction                   index=5769293
ERROR[12-15|16:59:29.682] Problem committing transaction           msg="nonce too low"
ERROR[12-15|16:59:29.682] Got error waiting for transaction to be added to chain msg="nonce too low"
ERROR[12-15|16:59:29.682] Could not verify                         error="Verifier cannot sync transaction batches to tip: Cannot sync transaction batches to tip: Cannot sync batches: cannot apply batched transaction: Cannot apply batched transaction: nonce too low"

To Reproduce

  1. Create a new node following the non-Docker configuration described here.
  2. Create unit files for DTL and l2geth (see code bellow).
  3. Start DTL and l2geth by running the following:
    systemctl daemon-reload
    systemctl start optimism-dtl
    systemctl start optimism-l2geth
    
  4. Restart l2geth with:
    systemctl restart optimism-l2geth
    

Expected behavior l2geth should continue working properly after restarting it.

System Specs:

  • OS: Ubuntu 20.04.5 LTS
  • Kernel: Linux 5.15.0-1026-aws
  • vCPUs: 4
  • RAM: 16GB
  • Disk: 1500GiB
  • Commit hash: af2cf449d1081e85ba346565d343ed7737871e75

Additional context Every now and then, my Optimism nodes completely break and I have to recreate them from scratch and wait about 6 days for them to sync again. Now, I realize that I can't even safely restart the node without breaking it. I'm not sure if this is a bug or if I'm doing something wrong, but I'm pretty sure that this is not the expected behavior because it's highly unstable.

If this instability can't be fixed, I welcome a way to recover nodes from backups. I've tried all sorts of DR solutions but, given that a service restart breaks l2geth, restoring never works since a service restart is implied.

Env and config files

DTL's .env:

DATA_TRANSPORT_LAYER__NODE_ENV=production
DATA_TRANSPORT_LAYER__ETH_NETWORK_NAME=mainnet
DATA_TRANSPORT_LAYER__DB_PATH=/chaindata/optimism/packages/data-transport-layer/db
DATA_TRANSPORT_LAYER__ADDRESS_MANAGER=0xdE1FCfB0851916CA5101820A69b13a4E276bd81F
DATA_TRANSPORT_LAYER__POLLING_INTERVAL=5000
DATA_TRANSPORT_LAYER__DANGEROUSLY_CATCH_ALL_ERRORS=true
DATA_TRANSPORT_LAYER__CONFIRMATIONS=12
DATA_TRANSPORT_LAYER__SERVER_HOSTNAME=localhost
DATA_TRANSPORT_LAYER__SERVER_PORT=7878
DATA_TRANSPORT_LAYER__SYNC_FROM_L1=true
DATA_TRANSPORT_LAYER__DEFAULT_BACKEND=l1
DATA_TRANSPORT_LAYER__L1_GAS_PRICE_BACKEND=l1
DATA_TRANSPORT_LAYER__L1_RPC_ENDPOINT=https://mainnet.infura.io/v3/REDACTED_KEY
DATA_TRANSPORT_LAYER__LOGS_PER_POLLING_INTERVAL=2000
DATA_TRANSPORT_LAYER__SYNC_FROM_L2=false
DATA_TRANSPORT_LAYER__L2_RPC_ENDPOINT=https://optimism-mainnet.infura.io/v3/REDACTED_KEY
DATA_TRANSPORT_LAYER__TRANSACTIONS_PER_POLLING_INTERVAL=1000
DATA_TRANSPORT_LAYER__L2_CHAIN_ID=10
DATA_TRANSPORT_LAYER__LEGACY_SEQUENCER_COMPATIBILITY=false

DTL's unit file (optimism-dtl.service):

[Unit]
Description = Optimism's Data Transport Layer
After       = network.target
Requires    = network.target

[Service]
User             = root
WorkingDirectory = /chaindata/optimism/packages/data-transport-layer
ExecStart        = yarn start
StandardOutput   = file:/var/log/optimism/dtl.log
StandardError    = file:/var/log/optimism/dtl-error.log
Restart          = on-failure
RestartSec       = 5
TimeoutStartSec  = 20
TimeoutStopSec   = 20

[Install]
WantedBy = multi-user.target

l2geth's unit file (optimism-l2geth.service):


[Unit]
Description = Optimism's L2 Geth
After       = network.target
Requires    = network.target

[Service]
Environment=CHAIN_ID=10
Environment=DATADIR=/chaindata/optimism/l2geth/gethData
Environment=NETWORK_ID=10
Environment=NO_DISCOVER=true
Environment=NO_USB=true
Environment=GASPRICE=0
Environment=GCMODE=full
Environment=BLOCK_SIGNER_ADDRESS=0x00000398232E2064F896018496b4b44b3D62751F
Environment=BLOCK_SIGNER_PRIVATE_KEY=6587ae678cf4fc9a33000cdbf9f35226b71dcc6a4684a31203241f9bcfd55d27
Environment=ETH1_CTC_DEPLOYMENT_HEIGHT=13596466
Environment=ETH1_SYNC_SERVICE_ENABLE=true
Environment=ROLLUP_ADDRESS_MANAGER_OWNER_ADDRESS=0x9BA6e03D8B90dE867373Db8cF1A58d2F7F006b3A
Environment=ROLLUP_BACKEND=l1
Environment=ROLLUP_CLIENT_HTTP=http://localhost:7878
Environment=ROLLUP_DISABLE_TRANSFERS=false
Environment=ROLLUP_ENABLE_L2_GAS_POLLING=false
Environment=ROLLUP_GAS_PRICE_ORACLE_OWNER_ADDRESS=0x648E3e8101BFaB7bf5997Bd007Fb473786019159
Environment=ROLLUP_MAX_CALLDATA_SIZE=40000
Environment=ROLLUP_POLL_INTERVAL_FLAG=1s
Environment=ROLLUP_SYNC_SERVICE_ENABLE=true
Environment=ROLLUP_TIMESTAMP_REFRESH=5m
Environment=ROLLUP_VERIFIER_ENABLE=true
Environment=RPC_ADDR=0.0.0.0
Environment=RPC_API=eth,rollup,net,web3,debug
Environment=RPC_CORS_DOMAIN=*
Environment=RPC_ENABLE=true
Environment=RPC_PORT=8545
Environment=RPC_VHOSTS=*
Environment=SEQUENCER_CLIENT_HTTP=https://optimism-mainnet.infura.io/v3/REDACTED_KEY
Environment=TARGET_GAS_LIMIT=15000000
Environment=USING_OVM=true
Environment=WS_ADDR=0.0.0.0
Environment=WS_API=eth,rollup,net,web3,debug
Environment=WS_ORIGINS=*
Environment=WS=true

User      = root
ExecStart = /chaindata/optimism/l2geth/build/bin/geth \
  --datadir=$DATADIR \
  --password=$DATADIR/password \
  --allow-insecure-unlock \
  --unlock=$BLOCK_SIGNER_ADDRESS \
  --mine \
  --miner.etherbase=$BLOCK_SIGNER_ADDRESS

StandardOutput  = file:/var/log/optimism/l2geth.log
StandardError   = file:/var/log/optimism/l2geth-error.log
Restart         = on-failure
RestartSec      = 5
TimeoutStartSec = 20
TimeoutStopSec  = 20

[Install]
WantedBy = multi-user.target

sanbotto avatar Dec 15 '22 18:12 sanbotto

I run a new node today, started syncing it from scratch and getting the same error

@sanbotto have you found a solution or workaround???

ManInWeb3 avatar Jan 12 '23 06:01 ManInWeb3

I run a new node today, started syncing it from scratch and getting the same error

@sanbotto have you found a solution or workaround???

Sadly, I haven't.

sanbotto avatar Jan 12 '23 20:01 sanbotto

Having the same issue here, resynced twice already as a result. Some feedback from the team would be great.

dnldd avatar Jan 17 '23 12:01 dnldd

Will just add that I have been having the same issue as well. I have redundant nodes and have been needing to frequently snapshot to restore when this occurs. It seems random and related to ungraceful shutdown and inability of the node to recover gracefully. Hoping op-geth and bedrock upgrade will resolve many of the sync/restart issues that I have been plagued with on l2-geth :crossed_fingers:

kaladinlight avatar Jan 20 '23 19:01 kaladinlight

snapshot to restore

How are you doing that? I haven't been able to recover a node using any form of backup, neither with full volume snapshots nor with these publicly available snapshots of just dtl and l2geth.

sanbotto avatar Jan 20 '23 20:01 sanbotto

snapshot to restore

How are you doing that? I haven't been able to recover a node using any form of backup, neither with full volume snapshots nor with these publicly available snapshots of just dtl and l2geth.

My infrastructure is deployed on aws, so I am just using ebs volume snapshots after shutting down the node and hoping the shutdown was graceful and I got a healthy snapshot. With a healthy snapshot, I have been able to successfully recreate a volume from the snapshot and mount to my running instance in order to spin back up. I have not tried any of the public snapshot images myself.

Do you see this error condition when spinning back up from a full volume snapshot? What version of l2geth are you running? I feel like there may have been a marginal stability improvement on l2geth v0.5.30, which noted some graceful shutdown improvements, but obviously didn't fix the problem.

kaladinlight avatar Jan 20 '23 23:01 kaladinlight

Sorry for the delay here. Team has been heads down on the Bedrock release and some l2geth maintenance has slipped through the cracks. Mismatched transaction is an obvious indicator that something is going wrong. Going to ask around tomorrow to figure out what might be happening here and see if we can get a release out.

Apologies again for the delay and very much appreciate your patience 🙏

smartcontracts avatar Feb 07 '23 04:02 smartcontracts

Hi, any news please? We are stuck with this problem too :(

Ekzer avatar Feb 21 '23 11:02 Ekzer

@smartcontracts, is it solwed?

bimlas avatar Apr 07 '23 09:04 bimlas

The bedrock upgrade fixes this

tynes avatar Jun 16 '23 19:06 tynes