I got "Head state missing, repairing" and the node does sync blocks any more
Hi, when I restart my geth node, I got the warn message and the node does not sync any more. It has been many hours.
WARN [05-30|13:04:36.330] Head state missing, repairing number=14,463,337 hash=4f01d4..959961 snaproot=980390..e6434b
The whole message after the node start:
INFO [05-30|13:04:14.312] Starting Geth on Ethereum mainnet...
INFO [05-30|13:04:14.314] Maximum peer count ETH=50 LES=0 total=50
INFO [05-30|13:04:14.314] Smartcard socket not found, disabling err="stat /run/pcscd/pcscd.comm: no such file or directory"
INFO [05-30|13:04:14.316] Set global gas cap cap=50,000,000
INFO [05-30|13:04:14.330] Allocated trie memory caches clean=307.00MiB dirty=512.00MiB
INFO [05-30|13:04:14.330] Allocated cache and file handles database=/data/eth_main/geth/chaindata cache=1024.00MiB handles=32767
INFO [05-30|13:04:23.073] Opened ancient database database=/data/eth_main/geth/chaindata/ancient readonly=false
INFO [05-30|13:04:23.249] Persisted trie from memory database nodes=12356 size=1.78MiB time=25.823995ms gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [05-30|13:04:23.250] Initialised chain configuration config="{ChainID: 1 Homestead: 1150000 DAO: 1920000 DAOSupport: true EIP150: 2463000 EIP155: 2675000 EIP158: 2675000 Byzantium: 4370000 Constantinople: 7280000 Petersburg: 7280000 Istanbul: 9069000, Muir Glacier: 9200000, Berlin: 12244000, London: 12965000, Arrow Glacier: 13773000, MergeFork: <nil>, Terminal TD: <nil>, Engine: ethash}"
INFO [05-30|13:04:35.570] Disk storage enabled for ethash caches dir=/data/eth_main/geth/ethash count=3
INFO [05-30|13:04:35.570] Disk storage enabled for ethash DAGs dir=/home/java-tron/.ethash count=2
INFO [05-30|13:04:35.574] Initialising Ethereum protocol network=1 dbversion=8
INFO [05-30|13:04:36.327] Loaded most recent local header number=14,463,337 hash=4f01d4..959961 td=44,767,534,003,696,042,413,376 age=2mo4d11h
INFO [05-30|13:04:36.327] Loaded most recent local full block number=14,463,337 hash=4f01d4..959961 td=44,767,534,003,696,042,413,376 age=2mo4d11h
INFO [05-30|13:04:36.327] Loaded most recent local fast block number=14,463,337 hash=4f01d4..959961 td=44,767,534,003,696,042,413,376 age=2mo4d11h
WARN [05-30|13:04:36.330] Head state missing, repairing number=14,463,337 hash=4f01d4..959961 snaproot=980390..e6434b
When I stop the node using kill pid , I got the following message:
INFO [05-30|11:45:18.054] Got interrupt, shutting down...
INFO [05-30|11:45:18.059] HTTP server stopped endpoint=[::]:8545
INFO [05-30|11:45:18.059] IPC endpoint closed url=/data/eth_main/geth.ipc
INFO [05-30|11:45:18.747] Downloader queue stats receiptTasks=0 blockTasks=33407 itemSize=82.43KiB throttle=3181
WARN [05-30|11:45:18.747] Synchronisation failed, retrying err="content processing canceled (requested)"
INFO [05-30|11:45:18.749] Ethereum protocol stopped
INFO [05-30|11:45:18.749] Transaction pool stopped
INFO [05-30|11:45:18.749] Waiting background transaction indexer to exit
WARN [05-30|11:45:18.768] Already shutting down, interrupt more to panic. times=9
ERROR[05-30|11:45:18.791] Failed to journal state snapshot err="snapshot [0xaae82e4b2b4223051bfa8f75f11434d712e0c1227d6c2bbd395637682fe03833] missing"
INFO [05-30|11:45:18.792] Writing cached state to disk block=14,463,337 hash=4f01d4..959961 root=aae82e..e03833
WARN [05-30|11:45:19.292] Already shutting down, interrupt more to panic. times=8
WARN [05-30|11:45:25.001] Already shutting down, interrupt more to panic. times=7
WARN [05-30|11:45:25.815] Already shutting down, interrupt more to panic. times=6
WARN [05-30|11:45:26.867] Already shutting down, interrupt more to panic. times=5
WARN [05-30|11:45:27.410] Already shutting down, interrupt more to panic. times=4
WARN [05-30|11:45:27.929] Already shutting down, interrupt more to panic. times=3
WARN [05-30|11:45:28.428] Already shutting down, interrupt more to panic. times=2
WARN [05-30|11:45:28.883] Already shutting down, interrupt more to panic. times=1
INFO [05-30|11:45:29.346] Looking for peers peercount=0 tried=114 static=0
panic: boom
Before I restart the node, it also has the repairing log, but it just continue minutes and go on.
WARN [05-26|17:17:29.280] Head state missing, repairing number=13,773,036 hash=feb273..d9e13d snaproot=a1e813..4813c0
INFO [05-26|17:20:04.406] Loaded most recent local header number=13,773,036 hash=feb273..d9e13d td=36,206,751,599,115,524,359,527 age=5mo2w2d
INFO [05-26|17:20:04.406] Loaded most recent local full block number=13,743,348 hash=17974a..337279 td=35,855,860,751,110,670,126,484 age=5mo3w1d
INFO [05-26|17:20:04.406] Loaded most recent local fast block number=13,773,036 hash=feb273..d9e13d td=36,206,751,599,115,524,359,527 age=5mo2w2d
WARN [05-26|17:20:04.407] Enabling snapshot recovery chainhead=13,743,348 diskbase=13,772,860
WARN [05-26|17:20:04.546] Snapshot is not continuous with chain snaproot=52c356..2fac0d chainroot=a3aecc..32d9d7
INFO [05-26|17:20:04.546] Loaded local transaction journal transactions=0 dropped=0
INFO [05-26|17:20:04.546] Regenerated local transaction journal transactions=0 accounts=0
INFO [05-26|17:20:04.550] Gasprice oracle is ignoring threshold set threshold=2
WARN [05-26|17:20:04.553] Unclean shutdown detected booted=2021-11-14T17:37:46+0800 age=6mo1w5d
INFO [05-26|17:20:04.553] Starting peer-to-peer node instance=Geth/ddblock/v1.10.9-stable-eae3b194/linux-amd64/go1.17
INFO [05-26|17:20:04.835] New local node record seq=1,634,206,267,554 id=e541d3a5d36734f6 ip=127.0.0.1 udp=30303 tcp=30303
So, how can I get the node sync again? And why I got the error?
The start command is
nohup geth --mainnet --syncmode "full" --datadir /data/eth_main --cache 2048 --identity "ddblock" --http --http.api eth,net,web3 --http.corsdomain "*" --http.port 8545 --port 30303 --http.addr 0.0.0.0 --http.vhosts "*" >> eth_main.out 2>&1 &
I used the geth-linux-amd64-1.10.17-25c9b49f and also tried to update to geth-linux-amd64-1.10.18-de23cf91, but it does not work.
After about 20 hours, the node sync from number 0.
WARN [05-30|11:45:32.608] Head state missing, repairing number=14,767,004 hash=6371c1..2fb12a sna
proot=1c3393..736d02
INFO [05-31|08:15:42.363] Loaded most recent local header number=0 hash=d4e567..cb8fa3 td=
17,179,869,184 age=53y2mo3d
INFO [05-31|08:15:42.363] Loaded most recent local full block number=0 hash=d4e567..cb8fa3 td=
17,179,869,184 age=53y2mo3d
INFO [05-31|08:15:42.363] Loaded most recent local fast block number=0 hash=d4e567..cb8fa3 td=
17,179,869,184 age=53y2mo3d
WARN [05-31|08:15:42.748] Enabling snapshot recovery chainhead=0 diskbase=14,771,503
WARN [05-31|08:15:42.969] Snapshot is not continuous with chain snaproot=60290b..bb096e chainroot=d7f897.
.0f0544
INFO [05-31|08:15:42.977] Loaded local transaction journal transactions=0 dropped=0
INFO [05-31|08:15:42.978] Regenerated local transaction journal transactions=0 accounts=0
INFO [05-31|08:15:42.979] Gasprice oracle is ignoring threshold set threshold=2
WARN [05-31|08:15:42.980] Unclean shutdown detected booted=2021-10-14T18:14:34+0800 age=7mo2w
4d
WARN [05-31|08:15:42.980] Unclean shutdown detected booted=2022-05-30T11:25:13+0800 age=20h50
m29s
WARN [05-31|08:15:42.980] Unclean shutdown detected booted=2022-05-30T11:43:15+0800 age=20h32
m27s
INFO [05-31|08:15:42.983] Starting peer-to-peer node instance=Geth/ddblock/v1.10.17-stable-25c
9b49f/linux-amd64/go1.18
INFO [05-31|08:15:43.173] New local node record seq=1,634,206,474,731 id=9bb93ce38e8e2e21
ip=127.0.0.1 udp=30303 tcp=30303
INFO [05-31|08:15:43.174] Started P2P networking self=enode://e39a1c7cccece4511de8962277fb
4bbd9fdc55bf8cc75b23521846633d7fc320065e5a4d1bb85fb12073c510c7e6c9e0f84ad988cb3a84063e32f992ca8fa09b@127.0.0
.1:30303
INFO [05-31|08:15:43.176] IPC endpoint opened url=/data/eth_main/geth.ipc
INFO [05-31|08:15:43.176] HTTP server started endpoint=[::]:8545 auth=false prefix= cor
s=* vhosts=*
INFO [05-31|08:15:47.023] New local node record seq=1,634,206,474,732 id=9bb93ce38e8e2e21
ip=47.253.61.22 udp=30303 tcp=30303
INFO [05-31|08:15:54.015] Looking for peers peercount=0 tried=12 static=0
INFO [05-31|08:16:04.021] Looking for peers peercount=0 tried=24 static=0
INFO [05-31|08:16:14.682] Looking for peers peercount=0 tried=29 static=0
INFO [05-31|08:16:24.682] Looking for peers peercount=0 tried=38 static=0
INFO [05-31|08:16:24.730] Block synchronisation started
INFO [05-31|08:16:25.166] Downloader queue stats receiptTasks=0 blockTasks=189 itemSize=11
1.09B throttle=8192
INFO [05-31|08:16:25.983] Imported new chain segment blocks=2 txs=0 mgas=0.000 elapsed=817.163
ms mgasps=0.000 number=2 hash=b495a1..4698c9 age=6y11mo6d dirty=4.65KiB
INFO [05-31|08:16:25.986] Indexed transactions blocks=3 txs=0 tail=0 elapsed="109.398µs"
INFO [05-31|08:16:30.086] Imported new chain segment blocks=2048 txs=0 mgas=0.000 elapsed=4.08
8s mgasps=0.000 number=2050 hash=2c83bc..497041 age=6y11mo6d dirty=485.39KiB
INFO [05-31|08:16:32.645] Imported new chain segment blocks=2048 txs=0 mgas=0.000 elapsed=2.54
9s mgasps=0.000 number=4098 hash=3a497e..040fe3 age=6y11mo6d dirty=576.57KiB
INFO [05-31|08:16:34.710] Imported new chain segment blocks=2048 txs=0 mgas=0.000 elapsed=2.05
5s mgasps=0.000 number=6146 hash=1d6dfe..771ede age=6y11mo6d dirty=593.28KiB
These logs are a bit strange, they are not in order and they seem to be missing Geth runs from in between them. It's extremely hard to see what's wrong when we don't have the full picture of what you've been doing.
All in all Geth really hates it if it's killed forcefully, like you did in one of your logs. Such terminations lead to state losses, which are in general recoverable, but if you kill it again during recovery, things just get worse.
If you have all the logs, starting from a healthy point going all the way down to the last failure of a rewind to genesis, we might be able to make heads or tails of it, but in the current case we're missing too much information.
I am having the same problem when the kubernetes pod needs to restart. I can't even upgrade versions without it having to repair the chain.
I tried to increase the Graceperiod for the pod, but it still kills itself off and has to come up in repair mode.