consul Cluster peering commands have a 1 in N consul servers chance of succeeding

Cluster peering commands have a 1 in N consul servers chance of succeeding

Open quinndiggitypolymath opened this issue 2 years ago • 3 comments

It seems that only when a Consul server node is the active raft leader that consul peering related commands can be run against it (follower nodes simply fail with Error listing peerings, for example)

On follower nodes, the errors being logged read:

[core]grpc: addrConn.createTransport failed to connect to {__cluster_name__-__internal_lan_ip__:8300 leader
transport: Error while dialing failed to find Consul server for global address "__cluster_name__-__internal_lan_ip__:8300"

where __cluster_name__ and __internal_lan_ip__ are both valid, and there is no issues with line of sight/CA certificates/etc - from any node logging the error, line of sight can be confirmed with telnet __internal_lan_ip__ 8300 and the connection attempt is logged on the remote end

Oct 19 '22 01:10 quinndiggitypolymath

Potentially related to: https://github.com/hashicorp/consul/issues/15051

Oct 19 '22 01:10 quinndiggitypolymath

@quinndiggitypolymath . just want to touch base on the status. Have you seen the error with the main branch? Thanks.

Oct 26 '22 17:10 huikang

Just following up on this - the issue still persists as of 1.13.6

https://github.com/hashicorp/consul/issues/15087#issuecomment-1412935822

Feb 02 '23 00:02 quinndiggitypolymath

consul consul copied to clipboard

Cluster peering commands have a 1 in N consul servers chance of succeeding

consul
consul copied to clipboard