nomad icon indicating copy to clipboard operation
nomad copied to clipboard

ResourceExhausted desc = this server has too many xDS streams open, please try another

Open Gabscap opened this issue 2 years ago • 1 comments
trafficstars

Nomad version

Nomad v1.4.3 (f464aca721d222ae9c1f3df643b3c3aaa20e2da7)

Operating system and Environment details

Debian Bullseye on 4 machines: Machine A: Consul Server/Client, Nomad Server/Client Machine B: Consul Server/Client, Nomad Server/Client Machine C: Consul Client, Nomad Server/Client Machine D: Consul Client, Nomad Client

Issue

I get the following consul error:

2022-12-10T12:43:14.877+0100 [ERROR] agent.envoy: Error handling ADS delta stream: xdsVersion=v3 error="rpc error: code = ResourceExhausted desc = this server has too many xDS streams open, please try another"

If I understand correctly consul 1.14 introduced a limit to distribute xDS streams among consul servers.

In my Nomad+Consul setup I have certain job constraints, which lead to one node having more consul connect enabled jobs than others. This leads to above error. How can I resolve this error? Should nomad distribute the requests among consul servers?

Nomad Config

data_dir = "/opt/nomad/data"

advertise {
  http = "10.9.0.2"
  rpc  = "10.9.0.2"
  serf = "10.9.0.2"
}

server {
  # license_path is required as of Nomad v1.1.1+
  #license_path = "/etc/nomad.d/nomad.hcl"
  enabled = true
  bootstrap_expect = 3
}

client {
  enabled = true
  network_interface = "wg0"

  min_dynamic_port = 26000
  max_dynamic_port = 32000
}

consul {
  address   = "127.0.0.1:8501"
  grpc_address = "127.0.0.1:8502"
  token     = "...."
  ssl       = true
  ca_file   = "/etc/nomad.d/consul-agent-ca.pem"
  cert_file = "/etc/nomad.d/dc1-server-consul.pem"
  key_file  = "/etc/nomad.d/dc1-server-consul-key.pem"
  auto_advertise      = true
  server_auto_join    = true
  client_auto_join    = true
}

acl {
  enabled = true
}

telemetry {
  prometheus_metrics         = true
  publish_allocation_metrics = true
  publish_node_metrics       = true
}

vault {
  enabled = true
  address = "http://vault.service.consul:8200"
  task_token_ttl = "1h"
  create_from_role = "nomad-cluster"
  token = "...."
  allow_unauthenticated = false
}

Consul Config

server = true
bootstrap_expect = 2
ui_config {
  enabled = true
}

client_addr = "127.0.0.1 10.9.0.2"
bind_addr = "10.9.0.2"
advertise_addr = "10.9.0.2"

datacenter = "dc1"
data_dir = "/opt/consul"
encrypt = "....."
tls {
  defaults {
    ca_file = "/etc/consul.d/consul-agent-ca.pem"
    cert_file = "/etc/consul.d/dc1-server-consul.pem"
    key_file = "/etc/consul.d/dc1-server-consul-key.pem"
    verify_incoming = true
    verify_outgoing = true
  }
  internal_rpc {
    verify_server_hostname = true
  }
}
auto_encrypt {
  allow_tls = true
}

acl = {
  enabled = true
  default_policy = "deny"
  enable_token_persistence = true
  tokens {
    agent = "...."
    default = "...."
  }
}

performance {
  raft_multiplier = 1
}

ports {
  https = 8501
  grpc = 8502
  grpc_tls = 8503
}

connect {
  enabled = true
}

retry_join = ["10.9.0.1", "10.9.0.2"]

Gabscap avatar Dec 10 '22 14:12 Gabscap

See hashicorp/consul#15753

Gabscap avatar Dec 14 '22 15:12 Gabscap

Should nomad distribute the requests among consul servers?

Your particular topology makes this unclear because you're having clients, servers, and Consul sharing the same host, but Nomad doesn't make xDS streams to Consul servers at all. Envoy has xDS stream connections to the local Consul agent (which in your topology is sometimes also a server). As noted in https://github.com/hashicorp/consul/issues/15753:

While it's possible to treat Consul servers as agents and register proxy services directly to them, it's uncommon outside of very small deployments because the overhead of health-checking could interfere with other server work (e.g. Raft) and cause cluster instability.

It looks like Consul has a fix for this uncommon topology with https://github.com/hashicorp/consul/pull/15789, which has been shipped in Consul 1.14.4. I'm going to close this issue out as complete.

tgross avatar Feb 13 '23 15:02 tgross