consul
consul copied to clipboard
Consul on Kubernetes Deployment: Was able to connect to Consul_Server_1 over TCP but UDP probes failed, network may be misconfigured
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
Overview of the Issue
Unable to connect Agents running on K8s to external Consul Servers which are running directly on VMs. We are not using official helm charts as of now.
Reproduction Steps
Install Consul in server mode on VMs (3 nodes).
{
"addresses": {
"dns": "127.0.0.1",
"grpc": "127.0.0.1",
"http": "127.0.0.1",
"https": "127.0.0.1"
},
"advertise_addr": "{{ GetInterfaceIP \"ens192\" }}",
"advertise_addr_wan": "{{ GetInterfaceIP \"ens192\" }}",
"bind_addr": "{{ GetInterfaceIP \"ens192\" }}",
"bootstrap": false,
"bootstrap_expect": 3,
"client_addr": "0.0.0.0",
"data_dir": "/var/lib/consul",
"datacenter": "dc1",
"disable_update_check": true,
"domain": "consul",
"enable_local_script_checks": true,
"enable_script_checks": true,
"enable_syslog": true,
"encrypt": "Some string",
"encrypt_verify_incoming": true,
"encrypt_verify_outgoing": true,
"log_level": "INFO",
"performance": {
"leave_drain_time": "5s",
"raft_multiplier": 1,
"rpc_hold_timeout": "7s"
},
"ports": {
"dns": 8600,
"grpc": 8502,
"http": 8500,
"https": -1,
"serf_lan": 8301,
"serf_wan": 8302,
"server": 8300
},
"raft_protocol": 3,
"retry_interval": "30s",
"retry_interval_wan": "30s",
"retry_join": [
"Consul_Server_1",
"Consul_Server_2",
"Consul_Server_3"
],
"retry_max": 0,
"retry_max_wan": 0,
"server": true,
"syslog_facility": "local0",
"translate_wan_addrs": false,
"ui_config": {
"enabled": false
}
}
Client config:
{
"addresses": {
"dns": "127.0.0.1",
"grpc": "127.0.0.1",
"http": "127.0.0.1",
"https": "127.0.0.1"
},
"advertise_addr": "{{ GetInterfaceIP \"eth0\" }}",
"advertise_addr_wan": "{{ GetInterfaceIP \"eth0\" }}",
"bind_addr": "{{ GetInterfaceIP \"eth0\" }}",
"client_addr": "127.0.0.1",
"data_dir": "/var/lib/consul",
"datacenter": "dc1",
"disable_update_check": true,
"domain": "consul",
"enable_local_script_checks": true,
"enable_script_checks": true,
"enable_syslog": false,
"encrypt": "some string",
"encrypt_verify_incoming": true,
"encrypt_verify_outgoing": true,
"log_level": "INFO",
"performance": {
"leave_drain_time": "5s",
"raft_multiplier": 1,
"rpc_hold_timeout": "7s"
},
"ports": {
"dns": 8600,
"grpc": 8502,
"http": 8500,
"https": -1,
"serf_lan": 8301,
"serf_wan": 8302,
"server": 8300
},
"raft_protocol": 3,
"retry_interval": "30s",
"retry_join": [
"Consul_Server_1",
"Consul_Server_2",
"Consul_Server_3"
],
"retry_max": 0,
"server": false,
"syslog_facility": "local0",
"translate_wan_addrs": false,
"ui_config": {
"enabled": false
}
}
Client Docker Image:
FROM consul:latest
EXPOSE 80 8080 443 5432 6432 8000-8350 8500-8700 53
COPY config.json /etc/consul.d/client/config.json
ENTRYPOINT consul agent -config-dir /etc/consul.d/client
Client K8 Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: consul-deployment
labels:
app: consul
spec:
selector:
matchLabels:
app: consul
replicas: 1
template:
metadata:
labels:
app: consul
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: DC
operator: In
values:
- derby
containers:
- name: consul-container
image: consul_test:0.2
ports:
- containerPort: 8500
name: ui-port
- containerPort: 8400
name: alt-port
- containerPort: 53
name: udp-port
- containerPort: 8443
name: https-port
- containerPort: 8080
name: http-port
- containerPort: 8301
protocol: UDP
name: serflan
- containerPort: 8302
name: serfwan
- containerPort: 8600
name: consuldns
- containerPort: 8300
name: server
- containerPort: 8502
name: gRPC
volumeMounts:
- name: consul-data
mountPath: /data
volumes:
- name: consul-data
emptyDir:
sizeLimit: 5Gi
Logs
2023-03-10T14:09:29.695Z [WARN] agent.client.memberlist.lan: memberlist: Was able to connect to Consul_Server_3 over TCP but UDP probes failed, network may be misconfigured
2023-03-10T14:09:30.196Z [DEBUG] agent.client.memberlist.lan: memberlist: Failed UDP ping: Consul_Server_1 (timeout reached)
2023-03-10T14:09:30.696Z [WARN] agent.client.memberlist.lan: memberlist: Was able to connect to Consul_Server_1 over TCP but UDP probes failed, network may be misconfigured
2023-03-10T14:09:30.798Z [DEBUG] agent.client.memberlist.lan: memberlist: Initiating push/pull sync with: Consul_Server_1 IP:8301
2023-03-10T14:09:30.799Z [WARN] agent.client.memberlist.lan: memberlist: Refuting a suspect message (from: consul-deployment-59bf886df7-w88cx)
2023-03-10T14:09:31.197Z [DEBUG] agent.client.memberlist.lan: memberlist: Failed UDP ping: Consul_Server_2 (timeout reached)
2023-03-10T14:09:31.696Z [WARN] agent.client.memberlist.lan: memberlist: Was able to connect to Consul_Server_2 over TCP but UDP probes failed, network may be misconfigured
2023-03-10T14:09:32.197Z [DEBUG] agent.client.memberlist.lan: memberlist: Failed UDP ping: Consul_Server_1 (timeout reached)
2023-03-10T14:09:32.697Z [WARN] agent.client.memberlist.lan: memberlist: Was able to connect to Consul_Server_1 over TCP but UDP probes failed, network may be misconfigured
2023-03-10T14:09:33.198Z [DEBUG] agent.client.memberlist.lan: memberlist: Failed UDP ping: Consul_Server_3 (timeout reached)
2023-03-10T14:09:33.698Z [WARN] agent.client.memberlist.lan: memberlist: Was able to connect to Consul_Server_3 over TCP but UDP probes failed, network may be misconfigured
2023-03-10T14:09:34.199Z [DEBUG] agent.client.memberlist.lan: memberlist: Failed UDP ping: Consul_Server_2 (timeout reached)
2023-03-10T14:09:34.699Z [WARN] agent.client.memberlist.lan: memberlist: Was able to connect to Consul_Server_2 over TCP but UDP probes failed, network may be misconfigured
2023-03-10T14:09:35.200Z [DEBUG] agent.client.memberlist.lan: memberlist: Failed UDP ping: Consul_Server_3 (timeout reached)
Expected behavior
Consul client should join consul server without errors.
Environment details
$ consul info (client)
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = 53f65dc3
version = 1.15.0
version_metadata =
consul:
acl = disabled
known_servers = 3
server = false
runtime:
arch = amd64
cpu_count = 8
goroutines = 54
max_procs = 8
os = linux
version = go1.20.1
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 8
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 21696
members = 4
query_queue = 0
query_time = 4
$ Consul info (server)
consul info
agent:
check_monitors = 2
check_ttls = 0
checks = 4
services = 2
build:
prerelease =
revision = 53f65dc3
version = 1.15.0
version_metadata =
consul:
acl = disabled
bootstrap = false
known_datacenters = 1
leader = true
leader_addr = Consul_Server_1:8300
server = true
raft:
applied_index = 698354
commit_index = 698354
fsm_pending = 0
last_contact = 0
last_log_index = 698354
last_log_term = 169
last_snapshot_index = 688389
last_snapshot_term = 168
latest_configuration = [{Suffrage:Voter ID: } {Suffrage:Voter ID: } {Suffrage:Voter ID: }]
latest_configuration_index = 0
num_peers = 2
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 169
runtime:
arch = amd64
cpu_count = 8
goroutines = 186
max_procs = 8
os = linux
version = go1.20.1
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 8
failed = 1
health_score = 0
intent_queue = 0
left = 1
member_time = 21698
members = 5
query_queue = 0
query_time = 4
serf_wan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1964
members = 3
query_queue = 0
query_time = 4
as per https://developer.hashicorp.com/consul/docs/architecture#lan-gossip-pool , if udp is not available agent will fall back to tcp. Does this make the consul client status frequently swing between alive/failed status?
because it is happening with us.
as per https://developer.hashicorp.com/consul/docs/architecture#lan-gossip-pool , if udp is not available agent will fall back to tcp. Does this make the consul client status frequently swing between alive/failed status?
because it is happening with us.
I'm not running on k8s but just inside a Docker container but have the same issue.
@soupdiver if using hostnetwork is fine for your requirement, it will work. Else it's gonna be a problem. You can also try advertising node ip instead of pod IP, that means only one consul container per node.
@soupdiver if using hostnetwork is fine for your requirement, it will work. Else it's gonna be a problem. You can also try advertising node ip instead of pod IP, that means only one consul container per node.
yea using host network works but what is the underlying issue? Even if I expose the serf lan port via tcp and udp the error shows up.
As per my deep dive, docker has limitations on how UDP works.
Same issue here, native 3 servers without any docker or VM in between. I've tested connections with nc and all good. But problem with consul persist.
I have the same problem with consul servers running on k8s and consul clients outside of k8s with docker. the problem was related to the docker limitation with UDP. The only workaround I found was running docker clients with hostNetwork: true option.