fabric icon indicating copy to clipboard operation
fabric copied to clipboard

panic: tocommit(25) is out of range [lastIndex(16)]. Was the raft log corrupted, truncated, or lost?

Open techyangj opened this issue 2 years ago • 5 comments

I have created different organizations in multiple servers. Each organization has multiple peer nodes and an orderer node. Everything on one server has been cleared. Later, I redeployed the organization with a configuration file. The orderer container has the following problems. Containers on other servers are running normally Orderer logs:

[orderer.consensus.etcdraft] commitTo -> PANI bda tocommit(25) is out of range [lastIndex(16)]. Was the raft log corrupted, truncated, or lost? channel=fabric-channel node=16 panic: tocommit(25) is out of range [lastIndex(16)]. Was the raft log corrupted, truncated, or lost?

goroutine 60 [running]: github.com/hyperledger/fabric/vendor/go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc00027e000, 0x0, 0x0, 0x0) /go/src/github.com/hyperledger/fabric/vendor/go.uber.org/zap/zapcore/entry.go:229 +0x546 github.com/hyperledger/fabric/vendor/go.uber.org/zap.(*SugaredLogger).log(0xc0001302e8, 0x4, 0x1085297, 0x5d, 0xc0005a5aa0, 0x2, 0x2, 0x0, 0x0, 0x0) /go/src/github.com/hyperledger/fabric/vendor/go.uber.org/zap/sugar.go:234 +0x100 github.com/hyperledger/fabric/vendor/go.uber.org/zap.(*SugaredLogger).Panicf(...) /go/src/github.com/hyperledger/fabric/vendor/go.uber.org/zap/sugar.go:159 github.com/hyperledger/fabric/common/flogging.(*FabricLogger).Panicf(0xc0001302f0, 0x1085297, 0x5d, 0xc0005a5aa0, 0x2, 0x2) /go/src/github.com/hyperledger/fabric/common/flogging/zap.go:74 +0x7c github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft.(*raftLog).commitTo(0xc000012bd0, 0x19) /go/src/github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft/log.go:203 +0x131 github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft.(*raft).handleHeartbeat(0xc00034f7c0, 0x8, 0x10, 0x7, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft/raft.go:1324 +0x54 github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft.stepCandidate(0xc00034f7c0, 0x8, 0x10, 0x7, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft/raft.go:1224 +0x7f0 github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft.(*raft).Step(0xc00034f7c0, 0x8, 0x10, 0x7, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft/raft.go:971 +0x1398 github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft.(*node).run(0xc0007c9380, 0xc00034f7c0) /go/src/github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft/node.go:357 +0x10d0 created by github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft.RestartNode /go/src/github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft/node.go:246 +0x31b

techyangj avatar Sep 15 '22 13:09 techyangj

How did you redeploy? Did you backup the files of the orderer's file system or something?

yacovm avatar Sep 15 '22 14:09 yacovm

How did you redeploy? Did you backup the files of the orderer's file system or something?

Thank you for your reply. I restarted all the containers in Docker-compose file and added the prepared channel file to the peer node. It is feasible, but there is a problem when installing the chain code. Moreover, the Orderer mirror is in a broken state to begin with, and the previous data should not be found. And I want this organization to resynchronize data from other organizations, what's a good way to do that?

techyangj avatar Sep 15 '22 14:09 techyangj

I'm sorry but I don't understand what steps you followed.

Can you be more clear and accurately describe everything you did?

yacovm avatar Sep 15 '22 15:09 yacovm

I'm sorry but I don't understand what steps you followed.

Can you be more clear and accurately describe everything you did?

In the first step, I deployed the Fabric network on multiple servers, each with an orderer node and multiple peer nodes. In the second step, I laid out the channel files and installed the chain code, approved the transaction, and initialized the chain code. Peer nodes on all machines can access the data synchronically and operate through the chain code. These two steps are normal deployment, and I have saved configuration files from deployment, including Docker-compose files, channel files, and so on. Step 3: Reinstall the operating system on one of my servers and all data is lost. I saved files to restore the organization of the machine, but found that the Orderer container and the peer container were restarted through the docker-compose file. I found that the orderer container was in exit state, so I went to see logs, and Panic happened, and nothing else happened. I just want to restore the organization, to bring the organization back into the network

techyangj avatar Sep 15 '22 15:09 techyangj

I have created different organizations in multiple servers. Each organization has multiple peer nodes and an orderer node. Everything on one server has been cleared. Later, I redeployed the organization with a configuration file.

I suspect the problem is when the cleaned up/reset orderer node joined the orderering service network, the current leader(orderer) tried to send the blocks/msg based on the last known index of that orderering node. But the node was restarted with the empty/initial state which did not match with the leader's match raft log index.

I am able to create the problem with the fabric-samples test-network-nano-bash with the following steps:

  1. Start the orderer 1, 2
  2. Submit couple of transactions
  3. Start the orderer 3 (which will join the quorum and replicate the blocks/raft log)
  4. Stop the orderer 3 and delete all the ledger and raft logs
  5. Start the orderer 3

Restart failed with the same panic:

2022-09-22 00:31:39.974 PDT 0325 INFO [orderer.consensus.etcdraft] becomeFollower -> 3 became follower at term 2 channel=test-system-channel-name node=3 2022-09-22 00:31:39.974 PDT 0326 PANI [orderer.consensus.etcdraft] commitTo -> tocommit(6) is out of range [lastIndex(3)]. Was the raft log corrupted, truncated, or lost? channel=test-system-channel-name node=3

unrecovered-panic] runtime.fatalpanic() /usr/local/go/src/runtime/panic.go:1065 (hits goroutine(43):1 total:1) (PC: 0x43d8a0)

To resolve the issue,

  1. From the orderer log find out the current leader orderer node and restart it. This will help to pick a new leader orderer node which would initialize with the initial match index. Then the other orderer node should be able to join the orderer network. (or)
  2. Restart all the orderer nodes

Param-S avatar Sep 22 '22 07:09 Param-S

I have created different organizations in multiple servers. Each organization has multiple peer nodes and an orderer node. Everything on one server has been cleared. Later, I redeployed the organization with a configuration file.

I suspect the problem is when the cleaned up/reset orderer node joined the orderering service network, the current leader(orderer) tried to send the blocks/msg based on the last known index of that orderering node. But the node was restarted with the empty/initial state which did not match with the leader's match raft log index.

I am able to create the problem with the fabric-samples test-network-nano-bash with the following steps:

  1. Start the orderer 1, 2
  2. Submit couple of transactions
  3. Start the orderer 3 (which will join the quorum and replicate the blocks/raft log)
  4. Stop the orderer 3 and delete all the ledger and raft logs
  5. Start the orderer 3

Restart failed with the same panic:

2022-09-22 00:31:39.974 PDT 0325 INFO [orderer.consensus.etcdraft] becomeFollower -> 3 became follower at term 2 channel=test-system-channel-name node=3 2022-09-22 00:31:39.974 PDT 0326 PANI [orderer.consensus.etcdraft] commitTo -> tocommit(6) is out of range [lastIndex(3)]. Was the raft log corrupted, truncated, or lost? channel=test-system-channel-name node=3

unrecovered-panic] runtime.fatalpanic() /usr/local/go/src/runtime/panic.go:1065 (hits goroutine(43):1 total:1) (PC: 0x43d8a0)

To resolve the issue,

  1. From the orderer log find out the current leader orderer node and restart it. This will help to pick a new leader orderer node which would initialize with the initial match index. Then the other orderer node should be able to join the orderer network. (or)
  2. Restart all the orderer nodes

Thank you very much. According your reply, this problem is solved.

techyangj avatar Sep 23 '22 06:09 techyangj

I have created different organizations in multiple servers. Each organization has multiple peer nodes and an orderer node. Everything on one server has been cleared. Later, I redeployed the organization with a configuration file.

I suspect the problem is when the cleaned up/reset orderer node joined the orderering service network, the current leader(orderer) tried to send the blocks/msg based on the last known index of that orderering node. But the node was restarted with the empty/initial state which did not match with the leader's match raft log index.

I am able to create the problem with the fabric-samples test-network-nano-bash with the following steps:

  1. Start the orderer 1, 2
  2. Submit couple of transactions
  3. Start the orderer 3 (which will join the quorum and replicate the blocks/raft log)
  4. Stop the orderer 3 and delete all the ledger and raft logs
  5. Start the orderer 3

Restart failed with the same panic:

2022-09-22 00:31:39.974 PDT 0325 INFO [orderer.consensus.etcdraft] becomeFollower -> 3 became follower at term 2 channel=test-system-channel-name node=3 2022-09-22 00:31:39.974 PDT 0326 PANI [orderer.consensus.etcdraft] commitTo -> tocommit(6) is out of range [lastIndex(3)]. Was the raft log corrupted, truncated, or lost? channel=test-system-channel-name node=3

unrecovered-panic] runtime.fatalpanic() /usr/local/go/src/runtime/panic.go:1065 (hits goroutine(43):1 total:1) (PC: 0x43d8a0)

To resolve the issue,

  1. From the orderer log find out the current leader orderer node and restart it. This will help to pick a new leader orderer node which would initialize with the initial match index. Then the other orderer node should be able to join the orderer network. (or)
  2. Restart all the orderer nodes

Thank you. This comment solved my problem.

davidfdr avatar Feb 15 '24 23:02 davidfdr