go-ethereum icon indicating copy to clipboard operation
go-ethereum copied to clipboard

`Failed to truncate extra state histories` with Sepolia history expiry

Open yorickdowne opened this issue 6 months ago • 2 comments

System information

Tested with Geth 1.15.11 and a latest build. Easier to reproduce on 1.15.11

Geth
Version: 1.15.12-unstable
Git Commit: 7ec493f66b5c18a75505718247dad404beadaaac
Git Commit Date: 20250609
Architecture: amd64
Go Version: go1.24.4
Operating System: linux

Issue

Geth logs Failed to truncate extra state histories on a fresh-synced, history-expired Sepolia.

This was easy to reproduce on 1.15.11, showed up pretty much immediately. latest looked like it was stable, at first, but eventually failed in the same way.

I have considered DB corruption caused by hardware failure. I hold this to be unlikely as the same host runs a mainnet Reth, which is showing no issues. Also a Sepolia Geth without history expiry appears to be stable.

Steps to reproduce the behaviour

Fresh sync Sepolia with --history.chain postmerge

Sample logs

execution-1  | Geth minimal node with pre-merge history expiry
execution-1  | INFO [06-09|14:53:11.814] Starting pprof server                    addr=http://0.0.0.0:6060/debug/pprof
execution-1  | INFO [06-09|14:53:11.822] Starting Geth on Sepolia testnet...
execution-1  | INFO [06-09|14:53:11.824] Maximum peer count                       ETH=50 total=50
execution-1  | INFO [06-09|14:53:11.825] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
execution-1  | INFO [06-09|14:53:11.827] Set global gas cap                       cap=50,000,000
execution-1  | INFO [06-09|14:53:11.828] Initializing the KZG library             backend=gokzg
execution-1  | INFO [06-09|14:53:11.838] Enabling metrics collection
execution-1  | INFO [06-09|14:53:11.838] Enabling stand-alone metrics HTTP endpoint address=127.0.0.1:6060
execution-1  | INFO [06-09|14:53:11.838] Starting metrics server                  addr=http://127.0.0.1:6060/debug/metrics
execution-1  | INFO [06-09|14:53:11.838] Allocated trie memory caches             clean=154.00MiB dirty=256.00MiB
execution-1  | ERROR[06-09|14:53:11.839] Failure in running metrics server        err="listen tcp 127.0.0.1:6060: bind: address already in use"
execution-1  | INFO [06-09|14:53:11.843] Using pebble as the backing database
execution-1  | INFO [06-09|14:53:11.843] Allocated cache and file handles         database=/var/lib/geth/geth/chaindata cache=512.00MiB handles=524,288
execution-1  | INFO [06-09|14:53:12.741] Opened ancient database                  database=/var/lib/geth/geth/chaindata/ancient/chain readonly=false
execution-1  | INFO [06-09|14:53:12.741] Opened Era store                         datadir=/var/lib/geth/geth/chaindata/ancient/chain/era
execution-1  | INFO [06-09|14:53:12.743] State scheme set to already existing     scheme=path
execution-1  | INFO [06-09|14:53:12.746] Initialising Ethereum protocol           network=11,155,111 dbversion=9
execution-1  | INFO [06-09|14:53:14.408] Opened ancient database                  database=/var/lib/geth/geth/chaindata/ancient/state readonly=false
execution-1  | CRIT [06-09|14:53:14.408] Failed to truncate extra state histories err="out of range, tail: 0, head: 36822, target: 36823"
execution-1 exited with code 0

yorickdowne avatar Jun 10 '25 07:06 yorickdowne

Just to confirm that Fresh sync Sepolia with --history.chain postmerge is sufficient to reproduce the issue right?

rjl493456442 avatar Jun 10 '25 09:06 rjl493456442

Afaik yes. I am not entirely certain as this node is used for RPC, and it’s possible an RPC call changes the timing internally somewhere such that this bug triggers.

If you cannot reproduce this, I can sync a node on 1.15.11 where it was easy to reproduce while getting debug or trace logs.

yorickdowne avatar Jun 10 '25 13:06 yorickdowne

I have not been able to reproduce on a node that doesn’t get rpc queries. Trying on one that does. Knowing my luck, the issue goes away once debug is on … will try a few ways. This reproduced 3 to 4 times in a row, until I tried to reproduce it: Then it worked as designed. Ain’t it always that way.

yorickdowne avatar Jun 19 '25 08:06 yorickdowne

Vexing. After reproducing several times in a row, now I can't reproduce at all. Maybe it's related to network state? I'll close this for now. I still think it's real, but I don't know how to make it happen.

yorickdowne avatar Jun 30 '25 17:06 yorickdowne