nomad
nomad copied to clipboard
ResourceExhausted desc = this server has too many xDS streams open, please try another
Nomad version
Nomad v1.4.3 (f464aca721d222ae9c1f3df643b3c3aaa20e2da7)
Operating system and Environment details
Debian Bullseye on 4 machines: Machine A: Consul Server/Client, Nomad Server/Client Machine B: Consul Server/Client, Nomad Server/Client Machine C: Consul Client, Nomad Server/Client Machine D: Consul Client, Nomad Client
Issue
I get the following consul error:
2022-12-10T12:43:14.877+0100 [ERROR] agent.envoy: Error handling ADS delta stream: xdsVersion=v3 error="rpc error: code = ResourceExhausted desc = this server has too many xDS streams open, please try another"
If I understand correctly consul 1.14 introduced a limit to distribute xDS streams among consul servers.
In my Nomad+Consul setup I have certain job constraints, which lead to one node having more consul connect enabled jobs than others. This leads to above error. How can I resolve this error? Should nomad distribute the requests among consul servers?
Nomad Config
data_dir = "/opt/nomad/data"
advertise {
http = "10.9.0.2"
rpc = "10.9.0.2"
serf = "10.9.0.2"
}
server {
# license_path is required as of Nomad v1.1.1+
#license_path = "/etc/nomad.d/nomad.hcl"
enabled = true
bootstrap_expect = 3
}
client {
enabled = true
network_interface = "wg0"
min_dynamic_port = 26000
max_dynamic_port = 32000
}
consul {
address = "127.0.0.1:8501"
grpc_address = "127.0.0.1:8502"
token = "...."
ssl = true
ca_file = "/etc/nomad.d/consul-agent-ca.pem"
cert_file = "/etc/nomad.d/dc1-server-consul.pem"
key_file = "/etc/nomad.d/dc1-server-consul-key.pem"
auto_advertise = true
server_auto_join = true
client_auto_join = true
}
acl {
enabled = true
}
telemetry {
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}
vault {
enabled = true
address = "http://vault.service.consul:8200"
task_token_ttl = "1h"
create_from_role = "nomad-cluster"
token = "...."
allow_unauthenticated = false
}
Consul Config
server = true
bootstrap_expect = 2
ui_config {
enabled = true
}
client_addr = "127.0.0.1 10.9.0.2"
bind_addr = "10.9.0.2"
advertise_addr = "10.9.0.2"
datacenter = "dc1"
data_dir = "/opt/consul"
encrypt = "....."
tls {
defaults {
ca_file = "/etc/consul.d/consul-agent-ca.pem"
cert_file = "/etc/consul.d/dc1-server-consul.pem"
key_file = "/etc/consul.d/dc1-server-consul-key.pem"
verify_incoming = true
verify_outgoing = true
}
internal_rpc {
verify_server_hostname = true
}
}
auto_encrypt {
allow_tls = true
}
acl = {
enabled = true
default_policy = "deny"
enable_token_persistence = true
tokens {
agent = "...."
default = "...."
}
}
performance {
raft_multiplier = 1
}
ports {
https = 8501
grpc = 8502
grpc_tls = 8503
}
connect {
enabled = true
}
retry_join = ["10.9.0.1", "10.9.0.2"]
See hashicorp/consul#15753
Should nomad distribute the requests among consul servers?
Your particular topology makes this unclear because you're having clients, servers, and Consul sharing the same host, but Nomad doesn't make xDS streams to Consul servers at all. Envoy has xDS stream connections to the local Consul agent (which in your topology is sometimes also a server). As noted in https://github.com/hashicorp/consul/issues/15753:
While it's possible to treat Consul servers as agents and register proxy services directly to them, it's uncommon outside of very small deployments because the overhead of health-checking could interfere with other server work (e.g. Raft) and cause cluster instability.
It looks like Consul has a fix for this uncommon topology with https://github.com/hashicorp/consul/pull/15789, which has been shipped in Consul 1.14.4. I'm going to close this issue out as complete.