consul-on-kubernetes icon indicating copy to clipboard operation
consul-on-kubernetes copied to clipboard

[ERR] agent: Coordinate update error: No cluster leader

Open duke-lv opened this issue 6 years ago • 12 comments

i have deploy the consul latest version on kubernetes V1.10.0 .but the consul pod's log show these error message: 2018/07/20 11:26:11 [WARN] agent: Check "service:ribbon-consumer" HTTP request failed: Get http://DESKTOP-MCQSJ49:8504/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2018/07/20 11:26:15 [ERR] agent: failed to sync remote state: No cluster leader 2018/07/20 11:26:16 [ERR] agent: Coordinate update error: No cluster leader

the cluster doesnt work correctly.

duke-lv avatar Jul 20 '18 11:07 duke-lv

its because one of the consul replicas must boot with -bootstrap option. since is a single file statefulset, add the option -bootstrap-expect=3

if you are using 3 replicas to consul, change to the number of replicas you are using

gabrielfsousa avatar Jul 25 '18 12:07 gabrielfsousa

Getting the same error:

2018/11/15 10:29:08 [INFO] agent: Discovered LAN servers:
2018/11/15 10:29:08 [WARN] agent: Join LAN failed: No servers to join, retrying in 30s
2018/11/15 10:29:15 [WARN] raft: no known peers, aborting election
2018/11/15 10:29:15 [ERR] agent: failed to sync remote state: No cluster leader
2018/11/15 10:29:23 [ERR] http: Request GET /v1/kv/config/gateway-prod/?recurse&token=<hidden>, error: No cluster leader from=10.233.68.72:32798

==> Newer Consul version available: 1.4.0 (currently running: 1.4.0)

karthikeayan avatar Nov 15 '18 10:11 karthikeayan

I also have this error. I have everything running in a namespace. Would that affect the label-based discovery, perhaps? I can see pods are running if I select with labels:

kubectl -n consul get po -l app=consul,component=server
NAME       READY   STATUS    RESTARTS   AGE
consul-0   1/1     Running   0          6m
consul-1   1/1     Running   0          7m
consul-2   1/1     Running   0          7m

I've updated to 1.4.2 of consul, and I'm running on GKE: 1.11.6-gke.3

My consul logs indicate no discovered servers:

2019/02/01 16:58:45 [ERR] agent: Coordinate update error: No cluster leader
2019/02/01 16:58:48 [ERR] agent: failed to sync remote state: No cluster leader
2019/02/01 16:58:49 [INFO] agent: Discovered LAN servers:
2019/02/01 16:58:49 [WARN] agent: Join LAN failed: No servers to join, retrying in 30s

I'm not sure what to check at this point. I have the -bootstrap-expect=3 enabled, but I wouldn't expect that to trigger anything if no other servers can be discovered...

micksear avatar Feb 01 '19 17:02 micksear

I had the same error with a docker hosted consul cluster (not on kubernetes though) and it turned out all of my instances had auto generated the same node ids. As soon as I manually set the node id differently on each instance (using -node-id argument) all was fine. Perhaps something to try.

goughlee avatar Feb 18 '19 14:02 goughlee

@micksear

I had same issue when running in a different namespace with Consul 1.5.1 Editing server.json fixed it:

  "retry_join": [
    "provider=k8s namespace=customnamespace label_selector=\"app=consul,component=server\""
  ]

e100 avatar May 24 '19 16:05 e100

Got the same error with bootstrap-expect=3 in my consul.yaml All pods into the same namespaces.

itsecforu avatar Dec 05 '19 08:12 itsecforu

Did somebody solve it?

itsecforu avatar Dec 11 '19 08:12 itsecforu

Bumped into this issue today. The issue is caused by Affinity Settings. By default, there are 3 replicas and if you have less than 3 nodes (e.g. 2), one pod won't come up and you will get the mentioned error. Thus, make sure that you have the corresponding number of node.

Batirchik avatar Feb 17 '20 15:02 Batirchik

Error from consul: 2020-05-29T04:19:22.499Z [INFO] agent: Joining cluster...: cluster=LAN 2020-05-29T04:19:22.499Z [INFO] agent: (LAN) joining: lan_addresses=[consul-server-0.consul-sever.n1.svc, consul-server-1.consul-server.n1.svc, consul-server-2.consul-server.n1.svc[] 2020-05-29T04:19:22.543Z [WARN] agent.server.memberlist.lan: memberlist: Failed to resolve consul-server-1.consul-server.n1.svc: lookup consul-server-1.consul-server.n1.svc on 10.0.0.10:53: no such host 2020-05-29T04:25:01.506Z [ERROR] agent: Coordinate update error: error="No cluster leader" 2020-05-29T04:25:06.768Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" │ 2020-05-29T04:25:29.271Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" │ but all consul pod's in running status & if we run consul join manually its working.

NAME READY STATUS RESTARTS AGE consul-server-0 1/2 Running 0 13m consul-server-1 1/2 Running 0 13m consul-server-2 1/2 Running 0 13m

gkannan66235 avatar May 29 '20 04:05 gkannan66235

This bug has not been resolved in the current version 1.9.1

gupf0719 avatar Feb 02 '21 02:02 gupf0719

@micksear

I had same issue when running in a different namespace with Consul 1.5.1 Editing server.json fixed it:

  "retry_join": [
    "provider=k8s namespace=customnamespace label_selector=\"app=consul,component=server\""
  ]

This resolved my issue for a cluster deployed into consul namespace, updated the server json in the configmap manifest to include below as per @e100

"retry_join": [
    "provider=k8s namespace=consul label_selector=\"app=consul,component=server\""
 ]

deeco avatar May 26 '21 15:05 deeco

I've seen this issue occurring for multiple people several times.

If on k8s besides setting -bootstrap-expect to the number of servers you're running (e.g. 3-5 pods), deleting all PVCs and volumes after uninstalling consul completely was the only solution that worked for me.

It didn't matter what was done and re/uninstalls (helm based) Consul would be unable to properly bootstrap and elect a leader until not only all components were removed from the (k8s) cluster but the PVCs and volumes.

This note should be in the k8s section btw.

cc @gupf0719

Carmezim avatar Oct 28 '21 17:10 Carmezim