PlatON-Go icon indicating copy to clipboard operation
PlatON-Go copied to clipboard

Node Panic : get unknown epoch

Open EchoLavender opened this issue 3 years ago • 5 comments

Hi there,

please note that this is an issue tracker reserved for bug reports and feature requests.

System information

Version: 0.15.0-unstable
Go Version: go1.13.4
OS: linux
uname -sr: Linux 4.15.0-118-generic
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.5 LTS
Release:	18.04
Codename:	bionic

Expected behaviour

normal running

Actual behaviour

[epoch error]
panic: get unknown epoch, current:39435, request:39436

Steps to reproduce the behaviour

Don`t known to reproduce 

Backtrace

[backtrace]
INFO [03-06|10:52:22.457] Write a StateDB instance to the cache    sealHash=2d01ad…0c798b blockNum=9858744
INFO [03-06|10:52:22.471] Recover journal message from wal         epoch=39435 view=24 msgType=*protocols.ConfirmedViewChange
INFO [03-06|10:52:22.471] Recover journal message from wal         epoch=39435 view=24 msgType=*protocols.ConfirmedViewChange
WARN [03-06|10:52:22.471] Reset rollback block                     hash=29bfc2…584450 number=9858750 rollback=0
INFO [03-06|10:52:22.471] Success to change view, current view deadline epoch=39436 view=0  deadline=2021-03-06T10:52:42+0000
INFO [03-06|10:52:22.471] Recover journal message from wal         epoch=39436 view=0  msgType=*protocols.ConfirmedViewChange
WARN [03-06|10:52:22.471] Reset rollback block                     hash=f561d4…bf5675 number=9858760 rollback=0
INFO [03-06|10:52:22.471] Success to change view, current view deadline epoch=39436 view=1  deadline=2021-03-06T10:52:42+0000
INFO [03-06|10:52:22.471] Recover journal message from wal         epoch=39436 view=1  msgType=*protocols.ConfirmedViewChange
WARN [03-06|10:52:22.471] Reset rollback block                     hash=50233d…403844 number=9858770 rollback=0
panic: get unknown epoch, current:39435, request:39436

goroutine 159 [running]:
github.com/PlatONnetwork/PlatON-Go/consensus/cbft/validator.(*ValidatorPool).epochToBlockNumber(0xc00038e1b0, 0x9a0c, 0xc0001ec87c)
	/opt/jenkins/workspace/PlatON/build_ubuntu/consensus/cbft/validator/validator.go:670 +0x125
github.com/PlatONnetwork/PlatON-Go/consensus/cbft/validator.(*ValidatorPool).validatorList(0xc00038e1b0, 0x9a0c, 0x1fb66b7, 0x949, 0x156500)
	/opt/jenkins/workspace/PlatON/build_ubuntu/consensus/cbft/validator/validator.go:546 +0x39
github.com/PlatONnetwork/PlatON-Go/consensus/cbft/validator.(*ValidatorPool).ValidatorList(0xc00038e1b0, 0x9a0c, 0x0, 0x0, 0x0)
	/opt/jenkins/workspace/PlatON/build_ubuntu/consensus/cbft/validator/validator.go:542 +0x98
github.com/PlatONnetwork/PlatON-Go/consensus/cbft.(*Cbft).ConsensusNodes(0xc001cf9800, 0x1fb66b7, 0x1fb66b7, 0x4b, 0xc0001eca48, 0x487197)
	/opt/jenkins/workspace/PlatON/build_ubuntu/consensus/cbft/cbft.go:1093 +0x94
github.com/PlatONnetwork/PlatON-Go/consensus/cbft/network.(*EngineManager).ConsensusNodes(...)
	/opt/jenkins/workspace/PlatON/build_ubuntu/consensus/cbft/network/handler.go:338
github.com/PlatONnetwork/PlatON-Go/consensus/cbft/network.(*router).kMixingRandomNodes(0xc0002011c0, 0xb34b6388f2aac107, 0x86cf1a072160323c, 0x54d8d0ed28cb49fa, 0x5e915cb8fb4bdded, 0x0, 0xc000c06100, 0x13eada0, 0xc001308200, 0x1e7ce88, ...)
	/opt/jenkins/workspace/PlatON/build_ubuntu/consensus/cbft/network/router.go:204 +0x43
github.com/PlatONnetwork/PlatON-Go/consensus/cbft/network.(*router).filteredPeers(0xc0002011c0, 0x4, 0xb34b6388f2aac107, 0x86cf1a072160323c, 0x54d8d0ed28cb49fa, 0x5e915cb8fb4bdded, 0x0, 0x0, 0x0, 0x0, ...)
	/opt/jenkins/workspace/PlatON/build_ubuntu/consensus/cbft/network/router.go:159 +0x188
github.com/PlatONnetwork/PlatON-Go/consensus/cbft/network.(*router).Gossip(0xc0002011c0, 0xc000f2f350)
	/opt/jenkins/workspace/PlatON/build_ubuntu/consensus/cbft/network/router.go:94 +0xdc
github.com/PlatONnetwork/PlatON-Go/consensus/cbft/network.(*EngineManager).broadcast(...)
	/opt/jenkins/workspace/PlatON/build_ubuntu/consensus/cbft/network/handler.go:149
github.com/PlatONnetwork/PlatON-Go/consensus/cbft/network.(*EngineManager).sendLoop(0xc001c92a00)
	/opt/jenkins/workspace/PlatON/build_ubuntu/consensus/cbft/network/handler.go:136 +0xf3
created by github.com/PlatONnetwork/PlatON-Go/consensus/cbft/network.(*EngineManager).Start
	/opt/jenkins/workspace/PlatON/build_ubuntu/consensus/cbft/network/handler.go:116 +0x3f

EchoLavender avatar Mar 06 '21 11:03 EchoLavender

Have you ever killed the program manually with -9?

benbaley avatar Mar 10 '21 01:03 benbaley

It can be seen from the only log that this is a node restart log, whether there is an operation to forcibly exit the process before the node restarts, such as: the machine is powered off or the process is forced to exit with kill -9. Need to have the last exit logs for analysis

niuxiaojie81 avatar Mar 10 '21 07:03 niuxiaojie81

Is the problem solved? @EchoLavender

biganxin avatar Mar 20 '21 02:03 biganxin

@niuxiaojie81 yep, use kill -9 to exit the Platon and occur this error

@biganxin
Recently due tot the hard disk space is full, want to use snapshot way to build another server for migration, occur this error too. generate new blskey to resync

EchoLavender avatar Apr 15 '21 09:04 EchoLavender

It took 6~7 hours to synchronize the latest height block. maybe PlatONnetwork could compress the levelDB data to downloads and to improve synchronize reference Polygon(matic)

[Disk usage]

before migration:

65G ./platon/chaindata

after migration:

7.7G ./platon/chaindata

EchoLavender avatar Apr 15 '21 10:04 EchoLavender

It took 6~7 hours to synchronize the latest height block. maybe PlatONnetwork could compress the levelDB data to downloads and to improve synchronize reference Polygon(matic)

[Disk usage]

before migration:

65G ./platon/chaindata

after migration:

7.7G ./platon/chaindata

to be optiomized

benbaley avatar Oct 14 '22 02:10 benbaley

Closed due to prolonged non-response

benbaley avatar Aug 31 '23 03:08 benbaley