consul icon indicating copy to clipboard operation
consul copied to clipboard

connect: Proxy defaults adds non connect-enabled services to ingress gateway

Open jmicceri opened this issue 2 years ago • 2 comments

Overview of the Issue

We're beginning to migrate all of our Consul services (1000+) into the service mesh. We use Nomad for our deployment tooling and have deployed an ingress gateway service in order to route to connect enabled services from outside of the mesh (Local users hitting service endpoints)

The ingress gateway listener is configured for all (*)services. As I understand it, with the wildcard listener that we have set an ingress gateway configured with the http protocol will automatically add all http services as upstreams to the gateway. Below is an excerpt from our deployed Nomad job definition where we configure the gateway and listener:

"Gateway": {
              "Proxy": {
                "ConnectTimeout": 5000000000,
                "EnvoyGatewayBindTaggedAddresses": false,
                "EnvoyGatewayBindAddresses": {
                  "*": {
                    "Address": "0.0.0.0",
                    "Port": 8443
                  }
                },
                "EnvoyGatewayNoDefaultBind": true,
                "EnvoyDNSDiscoveryType": "",
                "Config": {
                  "envoy_stats_bind_addr": "0.0.0.0:8404"
                }
              },
              "Ingress": {
                "TLS": {
                  "Enabled": true
                },
                "Listeners": [
                  {
                    "Port": 8443,
                    "Protocol": "http",
                    "Services": [
                      {
                        "Name": "*",
                        "Hosts": null
                      }
                    ]
                  }
                ]
              },
              "Terminating": null,
              "Mesh": null
            }
          },
          "

I know that by default Consul services are registered as tcp and as such they didn't get picked up as upstreams for the ingress gateway. To get around this, we applied the following proxy defaults configuration in our cluster so we didn't have to add a service defaults configuration for each service:

Kind      = "proxy-defaults"
Name      = "global"
Namespace = "default" # Can only be set to "default".
Config {
  protocol = "http"
  envoy_stats_bind_addr = "0.0.0.0:8404"
}

After applying the proxy defaults configuration, every service in our consul cluster was registered as an upstream including services that aren't connect enabled. According to the Consul docs on Proxy Defaults the proxy defaults only applies to services that are in the mesh so I'd expect only connect enabled services to be created as upstreams on the ingress gateway. Since services that aren't connect enabled cannot be routed to via the ingress gateway it's a bug that upstreams are created for them.

Is this the intended behavior or have I misconfigured something here?

Reproduction Steps

Steps to reproduce this issue:

  1. Start with a cluster that has all services registered with the tcp protocol
  2. Deploy an ingress gateway with Nomad configured with a wildcard http listener
  3. Apply the above proxy defaults config to consul cluster
  4. Ingress gateway automatically adds every service as an upstream

Consul info for both Client and Server

Client info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 11
	services = 8
build:
	prerelease = 
	revision = 81489d77
	version = 1.11.6+ent
consul:
	acl = enabled
	known_servers = 5
	server = false
license:
	customer = <redacted>
	expiration_time = <redacted>
	features = <redacted>
	id = <redacted>
	install_id = *
	issue_time = <redacted>
	modules = <redacted>
	product = consul
	start_time = <redacted>
runtime:
	arch = amd64
	cpu_count = 16
	goroutines = 239
	max_procs = 16
	os = linux
	version = go1.17.9
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 723
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 982752
	members = 151
	query_queue = 0
	query_time = 6
Server info
agent:
	check_monitors = 0
	check_ttls = 1
	checks = 7
	services = 7
build:
	prerelease = 
	revision = 81489d77
	version = 1.11.6+ent
consul:
	acl = enabled
	bootstrap = false
	known_datacenters = 2
	leader = false
	leader_addr = <redacted>:8300
	server = true
license:
	customer = <redacted>
	expiration_time = <redacted>
	features = <redacted>
	id = <redacted>
	install_id = *
	issue_time = <redacted>
	modules = <redacted>
	product = consul
	start_time = <redacted>
raft:
	applied_index = 1242928241
	commit_index = 1242928241
	fsm_pending = 0
	last_contact = 33.428078ms
	last_log_index = 1242928241
	last_log_term = 283
	last_snapshot_index = 1242913999
	last_snapshot_term = 283
	latest_configuration = [{Suffrage:Voter ID:9fb64507-ca1f-d625-329b-a0ce3770128e Address:<redacted>:8300} {Suffrage:Voter ID:53e6f5da-0b67-dd6a-6d4f-09c5c9ef9871 Address:<redacted>:8300} {Suffrage:Voter ID:db191dc2-6156-802f-3082-bfcdf5294d0d Address:<redacted>:8300} {Suffrage:Voter ID:9f038b67-b4f2-4e33-63f3-feb7aee3864e Address:<redacted>:8300} {Suffrage:Voter ID:ab8ba38e-a078-f15a-1d13-9c0f8b302154 Address:<redacted>:8300}]
	latest_configuration_index = 0
	num_peers = 4
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 283
runtime:
	arch = amd64
	cpu_count = 8
	goroutines = 11989
	max_procs = 8
	os = linux
	version = go1.17.9
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 723
	failed = 1
	health_score = 0
	intent_queue = 0
	left = 30
	member_time = 982758
	members = 181
	query_queue = 0
	query_time = 6
serf_wan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 7
	member_time = 433418
	members = 17
	query_queue = 0
	query_time = 2

Operating system and Environment details

OS: NAME="Ubuntu" VERSION="20.04.4 LTS (Focal Fossa)"

jmicceri avatar Jul 14 '22 15:07 jmicceri

Hey @jmicceri

Your hunch was correct, on face value this shouldn't add all the services to the ingress gateway. This seems like a bug, so we'll investigate this and let you know what we find.

I'll caution that we're very close to the 1.13 release, so we may not be able to get around to this till after the release.

Amier3 avatar Jul 19 '22 17:07 Amier3

I can reproduce the bug with the latest main. A workaround is to create a service-defaults entry for those non-mesh services with protocol = "tcp" to prevent the protocol being written by proxy-defaults. Will submit a patch to fix. Thanks.

huikang avatar Jul 25 '22 04:07 huikang

Hi @jmicceri, thanks for reporting the issue.

This bug was fixed in Consul 1.11.8 and 1.12.4 by PR #13958.

I'm going to close this issue now. Feel free to re-open it, or file a new issue, if you're still experiencing problems with ingress routing to non-connect services.

blake avatar Sep 13 '22 17:09 blake

I left a comment on the MR that was supposed to fix the issues with the terminating gateway but will post here as well for posterity.

I think this is not fixed for us because when you create services from Nomad, they are already registered as both service and connect service. The logic in the MR works really only for external services (hasDestination) but does not help at all for services that are created from Nomad.

Example:

  service {
      name = "api-admin"
      port = "80"

      connect {
        sidecar_service {}
      }

      check {
        expose         = true
        initial_status = "critical"
        name           = "http"
        type           = "http"
        interval       = "10s"
        timeout        = "2s"
        path           = "/_status"
      }
    }

komapa avatar Sep 26 '22 20:09 komapa

Hey @blake , given the above would it be possible to re-open this issue?

jmicceri avatar Sep 26 '22 20:09 jmicceri