go-ethereum icon indicating copy to clipboard operation
go-ethereum copied to clipboard

SIGSEGV with $ geth --syncmode "fast"

Open zoltan-fedor opened this issue 4 years ago • 10 comments

System information

Geth version: geth version Geth Version: 1.9.25-stable Git Commit: e7872729012a4871397307b12cc3f4772ffcbec6 Architecture: amd64 Protocol Versions: [65 64 63] Go Version: go1.15.6 Operating System: linux GOPATH= GOROOT=go

OS & Version: Linux Commit hash : (if develop)

Expected behaviour

Not segfaulting.

Actual behaviour

I am running $ geth --syncmode "fast" on Linux Mint inside a VirtualBox VM and I repeatedly get a segfault:

INFO [02-07|19:29:34.845] Looking for peers                        peercount=1 tried=4 static=0
INFO [02-07|19:29:34.850] Imported new block receipts              count=190  elapsed=44.408ms    number=376235   hash="ed4407…7e41a2" age=5y4mo3w   size=151.17KiB
WARN [02-07|19:29:34.917] Invalid header encountered               number=376479   hash="ffbb90…fe947a" parent="1fa79e…2a570b" err="invalid mix digest"
WARN [02-07|19:29:34.918] Rewinding blockchain                     target=374378
INFO [02-07|19:29:34.972] Imported new state entries               count=768  elapsed=2.313ms     processed=874087 pending=28420 trieretry=0 coderetry=0 duplicate=0 unexpected=0
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78ec7a]

goroutine 833 [running]:
github.com/ethereum/go-ethereum/core/types.(*Block).Header(...)
	github.com/ethereum/go-ethereum/core/types/block.go:343
github.com/ethereum/go-ethereum/core/rawdb.WriteAncientBlock(0x7f4648ffa6d8, 0xc000174a20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	github.com/ethereum/go-ethereum/core/rawdb/accessors_chain.go:660 +0x3a
github.com/ethereum/go-ethereum/core.(*BlockChain).InsertReceiptChain.func2(0xc00b141000, 0xc0, 0x100, 0xc0071c2800, 0xc0, 0x100, 0x0, 0x0, 0x0)
	github.com/ethereum/go-ethereum/core/blockchain.go:1236 +0x4ae
github.com/ethereum/go-ethereum/core.(*BlockChain).InsertReceiptChain(0xc0000a5900, 0xc00b25a000, 0xc0, 0xc0, 0xc000154c00, 0xc0, 0xc0, 0xacffff, 0x0, 0x0, ...)
	github.com/ethereum/go-ethereum/core/blockchain.go:1407 +0x1cca
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).commitFastSyncData(0xc0000c1dc0, 0xc007c93800, 0xc0, 0xc0, 0xc0007ae280, 0xc007c93800, 0xc0)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:1880 +0x663
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).processFastSyncContent(0xc0000c1dc0, 0x0, 0x0)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:1796 +0x785
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).syncWithPeer.func7(0x1, 0x0)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:554 +0x2a
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawnSync.func1(0xc0000c1dc0, 0xc0007ad440, 0xc0002ec3f0)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:568 +0x5b
created by github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawnSync
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:568 +0xaf

Memory test shows everything is okey:

$ sudo memtester 6800M 2
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 6800MB (7130316800 bytes)
got  6800MB (7130316800 bytes), trying mlock ...locked.
Loop 1/2:
  Stuck Address       : ok         
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok         
  Block Sequential    : ok         
  Checkerboard        : ok         
  Bit Spread          : ok         
  Bit Flip            : ok         
  Walking Ones        : ok         
  Walking Zeroes      : ok         
  8-bit Writes        : ok

Steps to reproduce the behaviour

Run: $ geth --syncmode "fast"

Backtrace

goroutine 833 [running]:
github.com/ethereum/go-ethereum/core/types.(*Block).Header(...)
	github.com/ethereum/go-ethereum/core/types/block.go:343
github.com/ethereum/go-ethereum/core/rawdb.WriteAncientBlock(0x7f4648ffa6d8, 0xc000174a20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	github.com/ethereum/go-ethereum/core/rawdb/accessors_chain.go:660 +0x3a
github.com/ethereum/go-ethereum/core.(*BlockChain).InsertReceiptChain.func2(0xc00b141000, 0xc0, 0x100, 0xc0071c2800, 0xc0, 0x100, 0x0, 0x0, 0x0)
	github.com/ethereum/go-ethereum/core/blockchain.go:1236 +0x4ae
github.com/ethereum/go-ethereum/core.(*BlockChain).InsertReceiptChain(0xc0000a5900, 0xc00b25a000, 0xc0, 0xc0, 0xc000154c00, 0xc0, 0xc0, 0xacffff, 0x0, 0x0, ...)
	github.com/ethereum/go-ethereum/core/blockchain.go:1407 +0x1cca
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).commitFastSyncData(0xc0000c1dc0, 0xc007c93800, 0xc0, 0xc0, 0xc0007ae280, 0xc007c93800, 0xc0)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:1880 +0x663
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).processFastSyncContent(0xc0000c1dc0, 0x0, 0x0)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:1796 +0x785
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).syncWithPeer.func7(0x1, 0x0)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:554 +0x2a
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawnSync.func1(0xc0000c1dc0, 0xc0007ad440, 0xc0002ec3f0)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:568 +0x5b
created by github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawn

zoltan-fedor avatar Feb 08 '21 01:02 zoltan-fedor

Interesting. The code path which panics is this:

			// Migrate all ancient blocks. This can happen if someone upgrades from Geth
			// 1.8.x to 1.9.x mid-fast-sync. Perhaps we can get rid of this path in the
			// long term.
			for {
				// We can ignore the error here since light client won't hit this code path.
				frozen, _ := bc.db.Ancients()
				if frozen >= block.NumberU64() {
					break
				}
				h := rawdb.ReadCanonicalHash(bc.db, frozen)
				b := rawdb.ReadBlock(bc.db, h, frozen)
				size += rawdb.WriteAncientBlock(bc.db, b, rawdb.ReadReceipts(bc.db, h, frozen, bc.chainConfig), rawdb.ReadTd(bc.db, h, frozen))

And just prior to the crash, we have this:

WARN [02-07|19:29:34.917] Invalid header encountered               number=376479   hash="ffbb90…fe947a" parent="1fa79e…2a570b" err="invalid mix digest"
WARN [02-07|19:29:34.918] Rewinding blockchain                     target=374378

So it looks to me like a rollback during the receipt-sync races with the background freezer, on a path which is not really intended to be used in normal mode.

holiman avatar Feb 08 '21 09:02 holiman

Yes, and I keep getting this at different points in the sync. See the same error below at a later point in the sync compared to the one above.

INFO [02-08|01:26:34.197] Imported new block receipts              count=56   elapsed=140.396ms   number=4647129  hash="13999e…8e9155" age=3y2mo3w   size=1.66MiB
WARN [02-08|01:26:35.198] Invalid header encountered               number=4650064  hash="1f5928…4a6a0d" parent="b879e4…273f01" err="invalid mix digest"
WARN [02-08|01:26:35.199] Rewinding blockchain                     target=4645967
INFO [02-08|01:26:35.848] Imported new state entries               count=768  elapsed=4.232ms     processed=17600224 pending=18943  trieretry=0   coderetry=0 duplicate=741 unexpected=5381
INFO [02-08|01:26:36.372] Rewind ancient data                      number=4646711
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78ec7a]

goroutine 195860 [running]:
github.com/ethereum/go-ethereum/core/types.(*Block).Header(...)
	github.com/ethereum/go-ethereum/core/types/block.go:343
github.com/ethereum/go-ethereum/core/rawdb.WriteAncientBlock(0x7f8d61f22d80, 0xc0001f6300, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	github.com/ethereum/go-ethereum/core/rawdb/accessors_chain.go:660 +0x3a
github.com/ethereum/go-ethereum/core.(*BlockChain).InsertReceiptChain.func2(0xc085177200, 0x39, 0x40, 0xc002bdcc00, 0x39, 0x40, 0x0, 0x0, 0x0)
	github.com/ethereum/go-ethereum/core/blockchain.go:1236 +0x4ae
github.com/ethereum/go-ethereum/core.(*BlockChain).InsertReceiptChain(0xc0000a5900, 0xc0a2ef2000, 0x39, 0x39, 0xc099feb700, 0x39, 0x39, 0xacffff, 0x0, 0x0, ...)
	github.com/ethereum/go-ethereum/core/blockchain.go:1407 +0x1cca
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).commitFastSyncData(0xc0009a61c0, 0xc0a76ede00, 0x39, 0x39, 0xc0886eb0e0, 0xc0a76ede00, 0x39)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:1880 +0x663
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).processFastSyncContent(0xc0009a61c0, 0x0, 0x0)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:1796 +0x785
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).syncWithPeer.func7(0xc097fec870, 0x0)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:554 +0x2a
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawnSync.func1(0xc0009a61c0, 0xc088b1d620, 0xc01df5a540)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:568 +0x5b
created by github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawnSync
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:568 +0xaf

zoltan-fedor avatar Feb 08 '21 22:02 zoltan-fedor

There seems to be two problems here. First of all,

number=376479   hash="ffbb90…fe947a" parent="1fa79e…2a570b" err="invalid mix digest"

This is a canon block:https://etherscan.io/block/376479 , so something is wrong on your machine. I would assume it's related to disk errors, and that the dag caches are somehow corrupt. I would recommend you to run some disk verification utility.

The other problem is that a rollback during fast-sync is causing panic. However, it seems to be happening at different codepaths, so might actually be caused by the same root issue, if there are disk errors.

holiman avatar Feb 09 '21 08:02 holiman

I just run fsck and no issue was found. But it is true, that this a a VirtualBox VM where the disk is hosted on a LUKS-encrypted volume. Maybe that's why the occasional errors?

zoltan-fedor avatar Feb 09 '21 14:02 zoltan-fedor

Unfortunately it is still happening.

INFO [02-09|09:40:24.896] Looking for peers                        peercount=1 tried=5 static=0
INFO [02-09|09:40:26.325] Imported new state entries               count=768  elapsed=3.753ms    processed=17711203 pending=7656  trieretry=0   coderetry=0 duplicate=0 unexpected=0
WARN [02-09|09:40:28.099] Invalid header encountered               number=4650125  hash="3e83c2…c6e984" parent="bb3f80…c1f88e" err="invalid mix digest"
WARN [02-09|09:40:28.100] Rewinding blockchain                     target=4647383
WARN [02-09|09:40:28.199] Truncating freezer table                 database=/home/user/.ethereum/geth/chaindata/ancient table=hashes items=4648706  limit=4648697
WARN [02-09|09:40:28.199] Truncating freezer table                 database=/home/user/.ethereum/geth/chaindata/ancient table=bodies items=4648706  limit=4648697
WARN [02-09|09:40:28.199] Truncating freezer table                 database=/home/user/.ethereum/geth/chaindata/ancient table=receipts items=4648706  limit=4648697
WARN [02-09|09:40:28.200] Truncating freezer table                 database=/home/user/.ethereum/geth/chaindata/ancient table=diffs    items=4648706  limit=4648697
WARN [02-09|09:40:28.200] Truncating freezer table                 database=/home/user/.ethereum/geth/chaindata/ancient table=headers  items=4648706  limit=4648697
INFO [02-09|09:40:28.201] Rewind ancient data                      number=4648696
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78ec7a]

goroutine 4816 [running]:
github.com/ethereum/go-ethereum/core/types.(*Block).Header(...)
	github.com/ethereum/go-ethereum/core/types/block.go:343
github.com/ethereum/go-ethereum/core/rawdb.WriteAncientBlock(0x7f9c36ef6e70, 0xc000406980, 0x0, 0x0, 0x0, 0x0, 0xc04884ebe0, 0xc04884ebe0)
	github.com/ethereum/go-ethereum/core/rawdb/accessors_chain.go:660 +0x3a
github.com/ethereum/go-ethereum/core.(*BlockChain).InsertReceiptChain.func2(0xc01316ec00, 0x2e, 0x40, 0xc047918600, 0x2e, 0x40, 0x0, 0x0, 0x0)
	github.com/ethereum/go-ethereum/core/blockchain.go:1236 +0x4ae
github.com/ethereum/go-ethereum/core.(*BlockChain).InsertReceiptChain(0xc0000a6580, 0xc002420900, 0x2e, 0x2e, 0xc0133c8900, 0x2e, 0x2e, 0xacffff, 0x0, 0x0, ...)
	github.com/ethereum/go-ethereum/core/blockchain.go:1407 +0x1cca
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).commitFastSyncData(0xc0002f01c0, 0xc002420780, 0x2e, 0x2e, 0xc0001b3900, 0xc002420780, 0x2e)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:1880 +0x663
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).processFastSyncContent(0xc0002f01c0, 0x0, 0x0)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:1796 +0x785
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).syncWithPeer.func7(0xc05c57ad80, 0xc0020d6420)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:554 +0x2a
github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawnSync.func1(0xc0002f01c0, 0xc0020a6480, 0xc066f4df50)
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:568 +0x5b
created by github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).spawnSync
	github.com/ethereum/go-ethereum/eth/downloader/downloader.go:568 +0xaf

zoltan-fedor avatar Feb 09 '21 14:02 zoltan-fedor

This loop needs to be deleted or protected by the chain insert lock at least: https://github.com/ethereum/go-ethereum/blob/1bbc8a1944312b91ca72a298c2438133906e9503/core/blockchain.go#L1225

fjl avatar Feb 11 '21 10:02 fjl

This seems to be caused by a race between the rewinding of the chain (because of invalid mix digest error) and the writing of ancient data during sync. These two things try to manipulate the database at the same time, which may invalidate some assumptions that the code is making.

We should figure out why this invalid mix digest error is happening, and also remove the loop or protect it.

fjl avatar Feb 11 '21 10:02 fjl

I kept repeatedly running into this after resetting the data from geth and restarting the syncing and loaded about the first 25 Gb of the chain. I have ended up completely deleting the .ethereum folder and starting the syncing again and then the error went away.

In summary it seems there is a problem - the race condition mentioned by @fjl, which probably would need to be addressed in general, even if my particular issue is no longer reproducible now.

zoltan-fedor avatar Feb 11 '21 13:02 zoltan-fedor

It happens weekly on my terminal. It also happens if you delete the .ethereum folder.

I am running the latest version (version: 1.9.25-stable) on Ubuntu 16.04.7 LTS

h6dworker avatar Feb 23 '21 01:02 h6dworker

i have something like this :S

geth_1                | WARN [09-16|15:53:04.065] Invalid header encountered               number=15,547,189 hash=4a80de..4e285a parent=1409da..843cd6 err="unknown ancestor"
geth_1                | WARN [09-16|15:53:04.138] Rewinding blockchain                     target=15,545,139
geth_1                | INFO [09-16|15:53:07.345] State heal in progress                   accounts=211,[email protected]  slots=430,[email protected] [email protected] nodes=15,935,[email protected] pending=27538
geth_1                | INFO [09-16|15:53:08.665] Loaded most recent local header          number=15,545,139 hash=92d170..f73065 td=58,750,003,716,598,352,816,469 age=6h55m45s
geth_1                | INFO [09-16|15:53:08.667] Loaded most recent local full block      number=0          hash=d4e567..cb8fa3 td=17,179,869,184                 age=53y5mo3w
geth_1                | INFO [09-16|15:53:08.667] Loaded most recent local fast block      number=15,545,139 hash=92d170..f73065 td=58,750,003,716,598,352,816,469 age=6h55m45s
geth_1                | INFO [09-16|15:53:08.668] Loaded last fast-sync pivot marker       number=15,547,120
geth_1                | WARN [09-16|15:53:08.668] Rolled back chain segment                header=15547188->15545139 snap=15547119->15545139 block=0->0 reason="unknown ancestor"

victorelec14 avatar Sep 16 '22 15:09 victorelec14

This is stale, closing

holiman avatar Feb 21 '23 09:02 holiman