consul
consul copied to clipboard
Strange DNS behavior when using External Services with CNAME records
Note: I posted my first findings about this at the forum, but received no response. I have now made a new repro.
Overview of the Issue
I have noted some kind of discrepancy in how DNS resolving (w/ recursion) works with external services
Reproduction Steps
Prerequisites:
- Running Consul in K8s (installed via the official Helm chart)
- coredns settings changed according to documentation to enable Consul DNS for
*.consul
URLs in K8s - ACL off
- Recursor 8.8.8.8 added
- add the following service with minimal needed config (I guess the
NodeMeta
stuff is just forconsul-esm
) from the documentation:{ "Node": "google", "Address": "www.google.com", "NodeMeta": { "external-node": "true", "external-probe": "true" }, "Service": { "Service": "search", "Port": 80 } }
- attach to a running pod and do
ping search.service.consul
. It succeeds. - now add the following service:
{ "Node": "nytimes", "Address": "www.nytimes.com", "NodeMeta": { "external-node": "true", "external-probe": "true" }, "Service": { "Service": "nyt", "Port": 80 } }
- attach to a running pod and do
ping nyt.service.consul
. It fails with 'bad address'. - doing a
ping nytimes.node.consul
succeeds
The difference I can see here is that www.nytimes.com
points to a CNAME (which in turn points to another CNAME) (since it sits behind a CDN) whereas www.google.com
returns an A record directly.
Pinging the node addresses <node>.node.consul
always works, but this won't really work for me since my users would have to keep track of which services are external services (and use node addresses) and which ones are internal (and use service addresses) in different environments.
This feels like there's some kind of recursion limit when using service URLs that is not in effect when using node URLs.
Consul info for both Client and Server
Client info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = 27de64da
version = 1.10.0
consul:
acl = disabled
known_servers = 3
server = false
runtime:
arch = amd64
cpu_count = 8
goroutines = 60
max_procs = 8
os = linux
version = go1.16.5
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 40
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 924
members = 6
query_queue = 0
query_time = 1
Server info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = 27de64da
version = 1.10.0
consul:
acl = disabled
bootstrap = false
known_datacenters = 1
leader = true
leader_addr = 10.3.42.196:8300
server = true
raft:
applied_index = 28529296
commit_index = 28529296
fsm_pending = 0
last_contact = 0
last_log_index = 28529297
last_log_term = 310
last_snapshot_index = 28516540
last_snapshot_term = 310
latest_configuration = [{Suffrage:Voter ID:3a891384-8162-4a94-a9b6-58b06e340d7a Address:10.3.41.87:8300} {Suffrage:Voter ID:1025407a-2ac4-43c8-900d-76b1de854648 Address:10.3.42.196:8300} {Suffrage:Voter ID:43cb
cd81-817b-0b82-29f1-b859e572587b Address:10.3.41.223:8300}]
latest_configuration_index = 0
num_peers = 2
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 310
runtime:
arch = amd64
cpu_count = 8
goroutines = 176
max_procs = 8
os = linux
version = go1.16.5
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 40
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 924
members = 6
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 392
members = 3
query_queue = 0
query_time = 1
Operating system and Environment details
Running on AKS (Azure)