fabric icon indicating copy to clipboard operation
fabric copied to clipboard

single orderer network : if the orderer is restarted, it becomes a follower and is not starting an election for the orderer to become a leader

Open Vbhaskar125 opened this issue 1 year ago • 9 comments

Description

SETUP

  1. Hyperledger Fabric version: 2.2
  2. Consensus: RAFT
  3. Blockchain network: 1
  4. Organizations:
  • 2 [each org is on a different Azure kubernetes cluster]
  • Each Org is on a separate channel (so 2 channels, these are named cefiuschannel & ibnpublic)
  • Each Org has 1 peer each
  1. Orderers: 1

ISSUE ON HAND

  1. We were trying to update the blockchain certificate before its expiry for which we did the following steps: a. We increased the validity of the new Orderer certificates from 1 year to 3 years. b. Restarted the Certificate Authority (CA) c. Restarted the Orderer POD

Post this we expected the following to happen:

  • Orderer should have successfully restarted and started an election which would make it the Leader (as there is only 1 Orderer).
  • Post this the certificates update commands should've been successfully executed.

However, we are experiencing the following:

  • Orderer does restart successfully BUT it immediately becomes a Follower without even attempting an election
  • The logs show the following:

2023-04-10 05:29:46.377 UTC 076a INFO [orderer.common.cluster] Configure -> Entering, channel: cefiuschannel, nodes: [] 2023-04-10 05:29:46.377 UTC 076b INFO [orderer.common.cluster] Configure -> Exiting 2023-04-10 05:29:46.377 UTC 076c DEBU [orderer.consensus.etcdraft] start -> Starting raft node: #peers: 1 channel=cefiuschannel node=1 2023-04-10 05:29:46.377 UTC 076d INFO [orderer.consensus.etcdraft] start -> Starting raft node to join an existing channel channel=cefiuschannel node=1 2023-04-10 05:29:46.377 UTC 076e INFO [orderer.consensus.etcdraft] becomeFollower -> 1 became follower at term 0 channel=cefiuschannel node=1 2023-04-10 05:29:46.377 UTC 076f INFO [orderer.consensus.etcdraft] newRaft -> newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0] channel=cefiuschannel node=1 2023-04-10 05:29:46.377 UTC 0770 INFO [orderer.consensus.etcdraft] becomeFollower -> 1 became follower at term 1 channel=cefiuschannel node=1 2023-04-10 05:29:46.377 UTC 0771 INFO [orderer.consensus.etcdraft] Start -> Starting Raft node channel=ibnpublic node=1 2023-04-10 05:29:46.377 UTC 0772 INFO [orderer.common.cluster] Configure -> Entering, channel: ibnpublic, nodes: [] 2023-04-10 05:29:46.377 UTC 0773 INFO [orderer.common.cluster] Configure -> Exiting 2023-04-10 05:29:46.377 UTC 0774 DEBU [orderer.consensus.etcdraft] start -> Starting raft node: #peers: 1 channel=ibnpublic node=1 2023-04-10 05:29:46.378 UTC 0775 INFO [orderer.consensus.etcdraft] start -> Starting raft node to join an existing channel channel=ibnpublic node=1 2023-04-10 05:29:46.378 UTC 0776 INFO [orderer.consensus.etcdraft] becomeFollower -> 1 became follower at term 0 channel=ibnpublic node=1 2023-04-10 05:29:46.378 UTC 0777 INFO [orderer.consensus.etcdraft] newRaft -> newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0] channel=ibnpublic node=1 2023-04-10 05:29:46.378 UTC 0778 INFO [orderer.consensus.etcdraft] becomeFollower -> 1 became follower at term 1 channel=ibnpublic node=1 2023-04-10 05:29:46.428 UTC 0779 INFO [orderer.common.server] Main -> Starting orderer:

Steps to reproduce

  1. Bring up a standard HLF network (version 2.2 with a system channel) with a single orderer on a kubernetes cluster.
  2. Once the network is up, restart the orderer pod by deleting it or by restarting through the deployments.
  3. The orderer becomes a follower in all the channels and fails to start an election.

Attaching the orderer log after the restart.

orderer0-deployment-7b67b8d496-ncccs.log

Vbhaskar125 avatar Apr 11 '23 07:04 Vbhaskar125

Hyperledger Fabric version: 2.2

Why 2.2? Can you try if it works on 2.5?

yacovm avatar Apr 11 '23 15:04 yacovm

When you re-issue a certificate of an orderer without a config update it needs to have the same public key, did you use the same public key?

yacovm avatar Apr 11 '23 21:04 yacovm

Hyperledger Fabric version: 2.2

Why 2.2? Can you try if it works on 2.5? our network was created an year ago with version 2.2. when we restarted the orderer, it did not become a leader. then we tried using orderer version 2.4. Please correct me if my understanding is wrong, the advantage of using 2.5 versus 2.2 is that we can join application channel without using a system channel.

what we tried (before the certificate expired)

  • changed the orderer to version 2.4 (Here also it was a follower in all the channels)
  • we tried to add 2 more orderers (v2.4) to application channels using channel join
  • here the new orderers were able to sync the data

When you re-issue a certificate of an orderer without a config update it needs to have the same public key, did you use the same public key?

-we were not able to reach this step as we weren't able to download the config block itself

Vbhaskar125 avatar Apr 12 '23 06:04 Vbhaskar125

Please correct me if my understanding is wrong, the advantage of using 2.5 versus 2.2 is that we can join application channel without using a system channel.

But I don't think we backport bug fixes to 2.2 at this time, @denyeart am I wrong?

yacovm avatar Apr 12 '23 07:04 yacovm

what we tried (before the certificate expired)

changed the orderer to version 2.4 (Here also it was a follower in all the channels)
we tried to add 2 more orderers (v2.4) to application channels using channel join
here the new orderers were able to sync the data

I don't understand. You had 1 orderer and you added 2 orderers and it worked and now you removed them again?

yacovm avatar Apr 12 '23 07:04 yacovm

what we tried (before the certificate expired)

changed the orderer to version 2.4 (Here also it was a follower in all the channels)
we tried to add 2 more orderers (v2.4) to application channels using channel join
here the new orderers were able to sync the data

I don't understand. You had 1 orderer and you added 2 orderers and it worked and now you removed them again?

  • we had 1 orderer network. after the restart of the orderer, it failed to become a leader.
  • so at this point, we have a network with no raft leader. Hence, no system/config updates are possible
  • so we tried bringing up 2 more orderers (orderer1 and orderer2 with version 2.4 image) just to have copy of application chains by joining them to application channels. (at this point the new orderers are still followers and are not part of consenters in the channel config)
  • then, we also upgraded the orderer0 image to use 2.4 version (hoping that atleast in the application channel it will become a leader)

but the orderer failed to start an election

Vbhaskar125 avatar Apr 12 '23 07:04 Vbhaskar125

Please correct me if my understanding is wrong, the advantage of using 2.5 versus 2.2 is that we can join application channel without using a system channel.

But I don't think we backport bug fixes to 2.2 at this time, @denyeart am I wrong?

Critical fixes will be backported to v2.2 through end of 2023. Users should upgrade to v2.5 this year so that when maintenance of v2.2 ends they will be able to get the future updates and fixes that will be targeted for v2.5.x only.

denyeart avatar Apr 12 '23 13:04 denyeart

When you re-issue a certificate of an orderer without a config update it needs to have the same public key, did you use the same public key?

@yacovm even if we use same public key, we need to do configuration update for orderer TLS certificate, right?

adhavpavan avatar Apr 18 '23 17:04 adhavpavan

Of course not. That's the entire idea of using the same public key - no need for a config change!

https://github.com/hyperledger/fabric/pull/1771

yacovm avatar Apr 18 '23 18:04 yacovm