go-ethereum
go-ethereum copied to clipboard
`Failed to truncate extra state histories` with Sepolia history expiry
System information
Tested with Geth 1.15.11 and a latest build. Easier to reproduce on 1.15.11
Geth
Version: 1.15.12-unstable
Git Commit: 7ec493f66b5c18a75505718247dad404beadaaac
Git Commit Date: 20250609
Architecture: amd64
Go Version: go1.24.4
Operating System: linux
Issue
Geth logs Failed to truncate extra state histories on a fresh-synced, history-expired Sepolia.
This was easy to reproduce on 1.15.11, showed up pretty much immediately. latest looked like it was stable, at first, but eventually failed in the same way.
I have considered DB corruption caused by hardware failure. I hold this to be unlikely as the same host runs a mainnet Reth, which is showing no issues. Also a Sepolia Geth without history expiry appears to be stable.
Steps to reproduce the behaviour
Fresh sync Sepolia with --history.chain postmerge
Sample logs
execution-1 | Geth minimal node with pre-merge history expiry
execution-1 | INFO [06-09|14:53:11.814] Starting pprof server addr=http://0.0.0.0:6060/debug/pprof
execution-1 | INFO [06-09|14:53:11.822] Starting Geth on Sepolia testnet...
execution-1 | INFO [06-09|14:53:11.824] Maximum peer count ETH=50 total=50
execution-1 | INFO [06-09|14:53:11.825] Smartcard socket not found, disabling err="stat /run/pcscd/pcscd.comm: no such file or directory"
execution-1 | INFO [06-09|14:53:11.827] Set global gas cap cap=50,000,000
execution-1 | INFO [06-09|14:53:11.828] Initializing the KZG library backend=gokzg
execution-1 | INFO [06-09|14:53:11.838] Enabling metrics collection
execution-1 | INFO [06-09|14:53:11.838] Enabling stand-alone metrics HTTP endpoint address=127.0.0.1:6060
execution-1 | INFO [06-09|14:53:11.838] Starting metrics server addr=http://127.0.0.1:6060/debug/metrics
execution-1 | INFO [06-09|14:53:11.838] Allocated trie memory caches clean=154.00MiB dirty=256.00MiB
execution-1 | ERROR[06-09|14:53:11.839] Failure in running metrics server err="listen tcp 127.0.0.1:6060: bind: address already in use"
execution-1 | INFO [06-09|14:53:11.843] Using pebble as the backing database
execution-1 | INFO [06-09|14:53:11.843] Allocated cache and file handles database=/var/lib/geth/geth/chaindata cache=512.00MiB handles=524,288
execution-1 | INFO [06-09|14:53:12.741] Opened ancient database database=/var/lib/geth/geth/chaindata/ancient/chain readonly=false
execution-1 | INFO [06-09|14:53:12.741] Opened Era store datadir=/var/lib/geth/geth/chaindata/ancient/chain/era
execution-1 | INFO [06-09|14:53:12.743] State scheme set to already existing scheme=path
execution-1 | INFO [06-09|14:53:12.746] Initialising Ethereum protocol network=11,155,111 dbversion=9
execution-1 | INFO [06-09|14:53:14.408] Opened ancient database database=/var/lib/geth/geth/chaindata/ancient/state readonly=false
execution-1 | CRIT [06-09|14:53:14.408] Failed to truncate extra state histories err="out of range, tail: 0, head: 36822, target: 36823"
execution-1 exited with code 0
Just to confirm that Fresh sync Sepolia with --history.chain postmerge is sufficient to reproduce the issue right?
Afaik yes. I am not entirely certain as this node is used for RPC, and it’s possible an RPC call changes the timing internally somewhere such that this bug triggers.
If you cannot reproduce this, I can sync a node on 1.15.11 where it was easy to reproduce while getting debug or trace logs.
I have not been able to reproduce on a node that doesn’t get rpc queries. Trying on one that does. Knowing my luck, the issue goes away once debug is on … will try a few ways. This reproduced 3 to 4 times in a row, until I tried to reproduce it: Then it worked as designed. Ain’t it always that way.
Vexing. After reproducing several times in a row, now I can't reproduce at all. Maybe it's related to network state? I'll close this for now. I still think it's real, but I don't know how to make it happen.