consul
consul copied to clipboard
connect: Proxy defaults adds non connect-enabled services to ingress gateway
Overview of the Issue
We're beginning to migrate all of our Consul services (1000+) into the service mesh. We use Nomad for our deployment tooling and have deployed an ingress gateway service in order to route to connect enabled services from outside of the mesh (Local users hitting service endpoints)
The ingress gateway listener is configured for all (*
)services. As I understand it, with the wildcard listener that we have set an ingress gateway configured with the http
protocol will automatically add all http
services as upstreams to the gateway. Below is an excerpt from our deployed Nomad job definition where we configure the gateway and listener:
"Gateway": {
"Proxy": {
"ConnectTimeout": 5000000000,
"EnvoyGatewayBindTaggedAddresses": false,
"EnvoyGatewayBindAddresses": {
"*": {
"Address": "0.0.0.0",
"Port": 8443
}
},
"EnvoyGatewayNoDefaultBind": true,
"EnvoyDNSDiscoveryType": "",
"Config": {
"envoy_stats_bind_addr": "0.0.0.0:8404"
}
},
"Ingress": {
"TLS": {
"Enabled": true
},
"Listeners": [
{
"Port": 8443,
"Protocol": "http",
"Services": [
{
"Name": "*",
"Hosts": null
}
]
}
]
},
"Terminating": null,
"Mesh": null
}
},
"
I know that by default Consul services are registered as tcp
and as such they didn't get picked up as upstreams for the ingress gateway. To get around this, we applied the following proxy defaults configuration in our cluster so we didn't have to add a service defaults configuration for each service:
Kind = "proxy-defaults"
Name = "global"
Namespace = "default" # Can only be set to "default".
Config {
protocol = "http"
envoy_stats_bind_addr = "0.0.0.0:8404"
}
After applying the proxy defaults configuration, every service in our consul cluster was registered as an upstream including services that aren't connect enabled. According to the Consul docs on Proxy Defaults the proxy defaults only applies to services that are in the mesh so I'd expect only connect enabled services to be created as upstreams on the ingress gateway. Since services that aren't connect enabled cannot be routed to via the ingress gateway it's a bug that upstreams are created for them.
Is this the intended behavior or have I misconfigured something here?
Reproduction Steps
Steps to reproduce this issue:
- Start with a cluster that has all services registered with the
tcp
protocol - Deploy an ingress gateway with Nomad configured with a wildcard
http
listener - Apply the above proxy defaults config to consul cluster
- Ingress gateway automatically adds every service as an upstream
Consul info for both Client and Server
Client info
agent:
check_monitors = 0
check_ttls = 0
checks = 11
services = 8
build:
prerelease =
revision = 81489d77
version = 1.11.6+ent
consul:
acl = enabled
known_servers = 5
server = false
license:
customer = <redacted>
expiration_time = <redacted>
features = <redacted>
id = <redacted>
install_id = *
issue_time = <redacted>
modules = <redacted>
product = consul
start_time = <redacted>
runtime:
arch = amd64
cpu_count = 16
goroutines = 239
max_procs = 16
os = linux
version = go1.17.9
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 723
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 982752
members = 151
query_queue = 0
query_time = 6
Server info
agent:
check_monitors = 0
check_ttls = 1
checks = 7
services = 7
build:
prerelease =
revision = 81489d77
version = 1.11.6+ent
consul:
acl = enabled
bootstrap = false
known_datacenters = 2
leader = false
leader_addr = <redacted>:8300
server = true
license:
customer = <redacted>
expiration_time = <redacted>
features = <redacted>
id = <redacted>
install_id = *
issue_time = <redacted>
modules = <redacted>
product = consul
start_time = <redacted>
raft:
applied_index = 1242928241
commit_index = 1242928241
fsm_pending = 0
last_contact = 33.428078ms
last_log_index = 1242928241
last_log_term = 283
last_snapshot_index = 1242913999
last_snapshot_term = 283
latest_configuration = [{Suffrage:Voter ID:9fb64507-ca1f-d625-329b-a0ce3770128e Address:<redacted>:8300} {Suffrage:Voter ID:53e6f5da-0b67-dd6a-6d4f-09c5c9ef9871 Address:<redacted>:8300} {Suffrage:Voter ID:db191dc2-6156-802f-3082-bfcdf5294d0d Address:<redacted>:8300} {Suffrage:Voter ID:9f038b67-b4f2-4e33-63f3-feb7aee3864e Address:<redacted>:8300} {Suffrage:Voter ID:ab8ba38e-a078-f15a-1d13-9c0f8b302154 Address:<redacted>:8300}]
latest_configuration_index = 0
num_peers = 4
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Follower
term = 283
runtime:
arch = amd64
cpu_count = 8
goroutines = 11989
max_procs = 8
os = linux
version = go1.17.9
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 723
failed = 1
health_score = 0
intent_queue = 0
left = 30
member_time = 982758
members = 181
query_queue = 0
query_time = 6
serf_wan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 7
member_time = 433418
members = 17
query_queue = 0
query_time = 2
Operating system and Environment details
OS: NAME="Ubuntu" VERSION="20.04.4 LTS (Focal Fossa)"
Hey @jmicceri
Your hunch was correct, on face value this shouldn't add all the services to the ingress gateway. This seems like a bug, so we'll investigate this and let you know what we find.
I'll caution that we're very close to the 1.13 release, so we may not be able to get around to this till after the release.
I can reproduce the bug with the latest main. A workaround is to create a service-defaults
entry for those non-mesh services with protocol = "tcp"
to prevent the protocol being written by proxy-defaults
. Will submit a patch to fix. Thanks.
Hi @jmicceri, thanks for reporting the issue.
This bug was fixed in Consul 1.11.8 and 1.12.4 by PR #13958.
I'm going to close this issue now. Feel free to re-open it, or file a new issue, if you're still experiencing problems with ingress routing to non-connect services.
I left a comment on the MR that was supposed to fix the issues with the terminating gateway but will post here as well for posterity.
I think this is not fixed for us because when you create services from Nomad, they are already registered as both service
and connect
service. The logic in the MR works really only for external services (hasDestination
) but does not help at all for services that are created from Nomad.
Example:
service {
name = "api-admin"
port = "80"
connect {
sidecar_service {}
}
check {
expose = true
initial_status = "critical"
name = "http"
type = "http"
interval = "10s"
timeout = "2s"
path = "/_status"
}
}
Hey @blake , given the above would it be possible to re-open this issue?