consul-k8s icon indicating copy to clipboard operation
consul-k8s copied to clipboard

Expose gossip/RPC ports or changing server serflan port causes networking issues when restarting a client pod

Open ndhanushkodi opened this issue 3 years ago • 1 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

When using server.exposeGossipAndRPCPorts and client.exposeGossipPorts, restarting or deleting any client pod causes network misconfiguration logs, when they weren't there during the original installation.

Reproduction Steps

  1. When running helm install with the following values.yml:
global:
  domain: consul
  datacenter: dc1

server:
  replicas: 2
  bootstrapExpect: 2
  exposeGossipAndRPCPorts: true
  ports:
    serflan:
      port: 9301

client:
  enabled: true
  grpc: true
  exposeGossipPorts: true

ui:
  enabled: true

connectInject:
  enabled: true

controller:
  enabled: true
  1. Delete a client pod k delete consul-t47t7
  2. Check every client pods logs and see that the newly started client has WARN logs:
2021-12-10T20:19:06.678Z [WARN]  agent.client.memberlist.lan: memberlist: Was able to connect to consul-server-1 but other probes failed, network may be misconfigured
2021-12-10T20:19:07.679Z [WARN]  agent.client.memberlist.lan: memberlist: Was able to connect to consul-server-0 but other probes failed, network may be misconfigured
2021-12-10T20:19:08.679Z [WARN]  agent.client.memberlist.lan: memberlist: Was able to connect to gke-acceptance-default-pool-5854770b-pphr but other probes failed, network may be misconfigured
2021-12-10T20:19:09.681Z [WARN]  agent.client.memberlist.lan: memberlist: Was able to connect to gke-acceptance-default-pool-5854770b-d02k but other probes failed, network may be misconfigured
2021-12-10T20:19:10.681Z [WARN]  agent.client.memberlist.lan: memberlist: Was able to connect to gke-acceptance-default-pool-5854770b-d02k but other probes failed, network may be misconfigured
2021-12-10T20:19:11.683Z [WARN]  agent.client.memberlist.lan: memberlist: Was able to connect to consul-server-0 but o

and every other client has the same logs but just for connecting to that newly started client.

Logs

Expected behavior

In a cluster without exposing gossip or rpc ports or changing the server serf lan port, deleting a client results in the following successful logs. We should expect the same when exposing gossip/rpc ports or changing the server serf lan port.

2021-12-10T20:10:32.078Z [INFO]  agent.client.serf.lan: serf: EventMemberLeave: gke-acceptance-default-pool-5854770b-y6mr 10.124.5.25
2021-12-10T20:10:41.145Z [INFO]  agent.client.memberlist.lan: memberlist: Updating address for left or failed node gke-acceptance-default-pool-5854770b-y6mr from 10.124.5.25:8301 to 10.124.5.30:8301
2021-12-10T20:10:41.145Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: gke-acceptance-default-pool-5854770b-y6mr 10.124.5.30

Environment details

GKE 1.20

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:41:42Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.10-gke.1600", GitCommit:"ef8e9f64449d73f9824ff5838cea80e21ec6c127", GitTreeState:"clean", BuildDate:"2021-09-06T09:24:20Z", GoVersion:"go1.15.15b5", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.22) and server (1.20) exceeds the supported minor version skew of +/-1

Additional Context

Relevant firewall rules allowing traffic into the external agent and into the GKE nodes. Screen Shot 2021-12-13 at 4 47 11 PM

Screen Shot 2021-12-13 at 4 46 04 PM

ndhanushkodi avatar Dec 10 '21 20:12 ndhanushkodi

same here

fbuinosquy1985 avatar Aug 30 '22 15:08 fbuinosquy1985

Closing as Consul K8s no longer utilizes clients for mesh.

david-yu avatar Nov 17 '22 01:11 david-yu