emissary icon indicating copy to clipboard operation
emissary copied to clipboard

Matching Mapping to Host using just label selectors doesn't work

Open ppanyukov opened this issue 1 year ago • 13 comments

Describe the bug

We are trying to do host-based routing like this:

https://service1.example.com => service1
https://service2.example.com => service2

We want to use label selectors to match Mapping to Host. The reasons are:

  • We may have multiple hosts for the same mapping, and we don't want to create a mapping for each.
  • We may not know the actual hosts names when we deploy the mapping, so we cannot specify hostname there.

What I discovered is that this functionality is broken for me: the routing just doesn't work.

Maybe I made some dumb typo somewhere? Maybe some other mistake?

To Reproduce

I started with completely empty Emissary configuration: no existing Host and Mapping objects.

Note that yaml here is getambassador.io/v2, but behaviour is the same with getambassador.io/v3alpha1

Step 1: Just service1 Mapping and Host

Routing works when just this is deployed.

# service1 host
apiVersion: getambassador.io/v2
kind: Host
metadata:
  name: service1.example.com
  namespace: emissary-ingress
spec:
  hostname: service1.example.com
  selector:
    matchLabels:
      hostKind: host-service1
  tls:
    alpn_protocols: h2,http/1.1
    min_tls_version: v1.2
  tlsSecret:
    name: tls-cert
---
# service1 mapping
apiVersion: getambassador.io/v2
kind: Mapping
metadata:
  labels:
    hostKind: host-service1
  name: service1
  namespace: platform
spec:
  prefix: /
  service: service1

The Envoy routes, from debug logs. This looks correct, and the chain is tied to host service1.example.com and routes to cluster_service1_platform_platform.

INFO: V3Listener <V3Listener HTTP emissary-ingress-https-listener on 0.0.0.0:8443 [XFP]>: generated ===========================
DEBUG:   chain CHAIN: tls=True hostglobs=['service1.example.com']
DEBUG:     host service1.example.com
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_ready XFP https -> ROUTE cluster_127_0_0_1_8877_emissary_ingress>
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_ready ALWAYS -> REDIRECT>
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_alive XFP https -> ROUTE cluster_127_0_0_1_8877_emissary_ingress>
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_alive ALWAYS -> REDIRECT>
DEBUG:       route <V3Route {'service1.example.com'}: PFX / XFP https -> ROUTE cluster_service1_platform_platform>
DEBUG:       route <V3Route {'service1.example.com'}: PFX / ALWAYS -> REDIRECT>

Step 2: Add Mapping for service2

Note that at this stage I add just a Mapping, without corresponding Host.

Doing so breaks existing routing for service1 completely.

# service2 Mapping
apiVersion: getambassador.io/v2
kind: Mapping
metadata:
  labels:
    hostKind: host-service2
  name: service2
  namespace: platform
spec:
  prefix: /
  service: service2

Envoy chains after this step are wrong. Note that the chain for service1.example.com is still present, but the route to cluster_service1_platform_platform disappeared, so it doesn't work.

INFO: V3Listener <V3Listener HTTP emissary-ingress-https-listener on 0.0.0.0:8443 [XFP]>: generated ===========================
DEBUG:   chain CHAIN: tls=True hostglobs=['service1.example.com']
DEBUG:     host service1.example.com
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_ready XFP https -> ROUTE cluster_127_0_0_1_8877_emissary_ingress>
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_ready ALWAYS -> REDIRECT>
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_alive XFP https -> ROUTE cluster_127_0_0_1_8877_emissary_ingress>
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_alive ALWAYS -> REDIRECT>

Step 3: Add Host for service2

After this step, everything is even more broken:

  • https://service1.example.com still gives 404 (or 502 don't remember)
  • https://service2.example.com randomly routes to either service1 or service2, round robin style.
apiVersion: getambassador.io/v2
kind: Host
metadata:
  name: service2.example.com
  namespace: emissary-ingress
spec:
  hostname: service2.example.com
  selector:
    matchLabels:
      hostKind: host-service2
  tls:
    alpn_protocols: h2,http/1.1
    min_tls_version: v1.2
  tlsSecret:
    name: tls-cert

The Envoy routes produced are clearly broken. Note that:

  • There is still a chain for service1.example.com host but again it doesn't have route to cluster_service1_platform_platform.
  • There isn't a chain for service2.example.com for some reason.
  • Instead, there is a chain for * host, with routes for both cluster_service1_platform_platform and cluster_service2_platform_platform there, which explains round-robin routing.
INFO: V3Listener <V3Listener HTTP emissary-ingress-https-listener on 0.0.0.0:8443 [XFP]>: generated ===========================
DEBUG:   chain CHAIN: tls=True hostglobs=['*']
DEBUG:     host *
DEBUG:       route <V3Route {'*'}: PFX /ambassador/v0/check_ready XFP https -> ROUTE cluster_127_0_0_1_8877_emissary_ingress>
DEBUG:       route <V3Route {'*'}: PFX /ambassador/v0/check_ready ALWAYS -> REDIRECT>
DEBUG:       route <V3Route {'*'}: PFX /ambassador/v0/check_alive XFP https -> ROUTE cluster_127_0_0_1_8877_emissary_ingress>
DEBUG:       route <V3Route {'*'}: PFX /ambassador/v0/check_alive ALWAYS -> REDIRECT>
DEBUG:       route <V3Route {'*'}: PFX / XFP https -> ROUTE cluster_service1_platform_platform>
DEBUG:       route <V3Route {'*'}: PFX / ALWAYS -> REDIRECT>
DEBUG:       route <V3Route {'*'}: PFX / XFP https -> ROUTE cluster_service2_platform_platform>
DEBUG:       route <V3Route {'*'}: PFX / ALWAYS -> REDIRECT>
DEBUG:   chain CHAIN: tls=True hostglobs=['service1.example.com']
DEBUG:     host service1.example.com
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_ready XFP https -> ROUTE cluster_127_0_0_1_8877_emissary_ingress>
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_ready ALWAYS -> REDIRECT>
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_alive XFP https -> ROUTE cluster_127_0_0_1_8877_emissary_ingress>
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_alive ALWAYS -> REDIRECT>

Step3: adding host to Mappings fixes it

When I explicitly specify host in both mappings, everything works as expected.

spec:
  prefix: /
  service: service1
  host: service1.example.com

Envoy routes look correct here:

INFO: V3Listener <V3Listener HTTP emissary-ingress-https-listener on 0.0.0.0:8443 [XFP]>: generated ===========================
DEBUG:   chain CHAIN: tls=True hostglobs=['service1.example.com']
DEBUG:     host service1.example.com
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_ready XFP https -> ROUTE cluster_127_0_0_1_8877_emissary_ingress>
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_ready ALWAYS -> REDIRECT>
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_alive XFP https -> ROUTE cluster_127_0_0_1_8877_emissary_ingress>
DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_alive ALWAYS -> REDIRECT>
DEBUG:       route <V3Route {'service1.example.com'}: PFX / XFP https -> ROUTE cluster_service1_platform_platform>
DEBUG:       route <V3Route {'service1.example.com'}: PFX / ALWAYS -> REDIRECT>
DEBUG:   chain CHAIN: tls=True hostglobs=['service2.example.com']
DEBUG:     host service2.example.com
DEBUG:       route <V3Route {'service2.example.com'}: PFX /ambassador/v0/check_ready XFP https -> ROUTE cluster_127_0_0_1_8877_emissary_ingress>
DEBUG:       route <V3Route {'service2.example.com'}: PFX /ambassador/v0/check_ready ALWAYS -> REDIRECT>
DEBUG:       route <V3Route {'service2.example.com'}: PFX /ambassador/v0/check_alive XFP https -> ROUTE cluster_127_0_0_1_8877_emissary_ingress>
DEBUG:       route <V3Route {'service2.example.com'}: PFX /ambassador/v0/check_alive ALWAYS -> REDIRECT>
DEBUG:       route <V3Route {'service2.example.com'}: PFX / XFP https -> ROUTE cluster_service2_platform_platform>
DEBUG:       route <V3Route {'service2.example.com'}: PFX / ALWAYS -> REDIRECT>

Expected behavior

The routing with label selectors without specifying host should work the same way as when the host is explicitly specified.

Versions (please complete the following information):

  • Ambassador: 3.6.0 (deployed with Helm version 8.6.0)
  • Kubernetes environment: AKS 1.25.5

EDIT: I fixed some minor typos in the text

ppanyukov avatar Jun 14 '23 12:06 ppanyukov

Hi @ppanyukov, there are notes on associating a Mapping with a Host using the mappingSelector here. Could you check your config against this example?

cindymullins-dw avatar Jun 15 '23 20:06 cindymullins-dw

I'm pretty certain I tried everything, all with the same results, but yes sure I will give it another try.

From the CDR docs:

mappingSelector:
    description: Selector for Mappings we'll associate with this Host.
      At the moment, Selector and MappingSelector are synonyms, but that
      will change soon.

Also, when I do kubectl get <host> I get back everything with apiVersion: getambassador.io/v2 and selector instead of mappingSelector regardless of how the original yaml was specifid. Could this be an issue with apiext?

ppanyukov avatar Jun 16 '23 09:06 ppanyukov

Obviously this isn't ideal but is it an option to associate your Mappings and Hosts by the hostname attribute in the meantime, until we figure out why labels aren't working?

cindymullins-dw avatar Jul 01 '23 02:07 cindymullins-dw

@cindymullins-dw I'm yet to try your suggestion, it takes time to set up environments and test and I'm currently too busy with other things.

Yes, it might be possible to use hostname attribute in some circumstances that we need. The thing is, still need to tests that it doesn't break our existing stuff, and this also needs time!

It's on my TODO list, should be able to test in a couple of weeks time.

ppanyukov avatar Jul 03 '23 17:07 ppanyukov

As to this question: "Also, when I do kubectl get <host> I get back everything with apiVersion: getambassador.io/v2 and selector instead of mappingSelector regardless of how the original yaml was specifid. Could this be an issue with apiext?"

The docs say Note: The mappingSelector field is only configurable on v3alpha1 CRDs. In the v2 CRDs the equivalent field is selector. Either selector or mappingSelector may be configured in the v3alpha1 CRDs, but selector has been deprecated in favor of mappingSelector.

There was a bug fix in 3.2 re the mappingSelector. DISABLE_STRICT_LABEL_SELECTORS: true disables this fix and reverts to the old behavior. The fix was intended to make matching via label more deterministic and intentional.

cindymullins-dw avatar Jul 10 '23 22:07 cindymullins-dw

About getting everything back with apiVersion: getambassador.io/v2, I believe that's because the storage version is v2 (and we're only allowed one storage version per Kubernetes). If you check that output, under ambassador_id you'll see - --apiVersion-v3alpha1-only--default if you applied the resource as v3alpha1. This is displayed when the v3alpha1 CRD is being retrieved in v2 format like you mentioned.

I've had another look thru this and I agree it should be working based on the labels alone. Also its troubling that subsequent config breaks the original working example w/labels. Would you be able to join a help session with us some Thursday at 2:30pm ET. Would be great to see this in action.

cindymullins-dw avatar Aug 05 '23 02:08 cindymullins-dw

This week 10 Aug I probably won't be able to make it, I already have plans for the evening, unless I reschedule. I will DM you on slack to arrange an alternative day.

I also had plans to finally do another repro with mappingSelector very soon, life's very busy here so need to carve out time for this. Might be able to do early next week.

ppanyukov avatar Aug 05 '23 12:08 ppanyukov

For me also mappings were not getting associated after upgrading to v3.7.1 from v2.3.2. I then added DISABLE_STRICT_LABEL_SELECTORS: true in the deployment and it worked!

sourabhgupta385 avatar Aug 11 '23 07:08 sourabhgupta385

I have done another reproduction of the problem and can confirm things are broken in 3.5.0 and 3.7.2:

  • When using selector (originally reported issue)
  • When using mappingSelector (as suggested by @cindymullins-dw )
  • Setting env var DISABLE_STRICT_LABEL_SELECTORS = "true" doesn't seem to help either @sourabhgupta385

The current workaround is:

  • Using explicit hostname in the Mapping. This does resolve the problem but obviously defeats the host to mapping matching using labels.

I also see there is an upcoming 3.8.0 release with the announced bug fix which does seem to address exactly this issue (although it references a different and rather old issue), it would be good to have confirmation that this is indeed the case and aldo ETA of the release please :)

https://github.com/emissary-ingress/emissary/blob/master/CHANGELOG.md#380-tbd

Bugfix: As of v2.2.2, if two mappings were associated with different Hosts through host mappingSelector labels but share the same prefix, the labels were not taken into account which would cause one Mapping to be correctly routed but the other not. This change fixes this issue so that Mappings sharing the same prefix but associated with different Hosts will be correctly routed. (https://github.com/emissary-ingress/emissary/issues/4170)

For completeness, here are the exact yaml files used to repro the issue using mappingSelector. I just changed the domain name to example.com there.

service1.example.com
# service1 host
apiVersion: getambassador.io/v3alpha1
kind: Host
metadata:
  name: service1.example.com
  namespace: emissary-ingress
spec:
  hostname: service1.example.com
  mappingSelector:
    matchLabels:
      hostKind: host-service1
  tls:
    alpn_protocols: h2,http/1.1
    min_tls_version: v1.2
  tlsSecret:
    name: tls-wild--example--com--cf-backend
---
# service1 mapping
apiVersion: getambassador.io/v3alpha1
kind: Mapping
metadata:
  labels:
    hostKind: host-service1
  name: service1
  namespace: platform
spec:
  prefix: /
  service: service1
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/instance: service1
    app.kubernetes.io/name: sample-api
  name: service1
  namespace: platform
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app.kubernetes.io/instance: service1
    app.kubernetes.io/name: sample-api
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/instance: service1
    app.kubernetes.io/name: sample-api
  name: service1
  namespace: platform
spec:
  replicas: 1
  revisionHistoryLimit: 1
  selector:
    matchLabels:
      app.kubernetes.io/instance: service1
      app.kubernetes.io/name: sample-api
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: service1
        app.kubernetes.io/name: sample-api
    spec:
      containers:
      - image: nginx:latest
        imagePullPolicy: IfNotPresent
        name: sample-api
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
      tolerations:
      - effect: NoSchedule
        key: kubernetes.azure.com/scalesetpriority
        operator: Equal
        value: spot
        
service2.example.com
# service2 host
apiVersion: getambassador.io/v3alpha1
kind: Host
metadata:
  name: service2.example.com
  namespace: emissary-ingress
spec:
  hostname: service2.example.com
  mappingSelector:
    matchLabels:
      hostKind: host-service2
  tls:
    alpn_protocols: h2,http/1.1
    min_tls_version: v1.2
  tlsSecret:
    name: tls-wild--example--com--cf-backend
---
# service2 mapping
apiVersion: getambassador.io/v3alpha1
kind: Mapping
metadata:
  labels:
    hostKind: host-service2
  name: service2
  namespace: platform
spec:
  prefix: /
  service: service2
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/instance: service2
    app.kubernetes.io/name: sample-api
  name: service2
  namespace: platform
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app.kubernetes.io/instance: service2
    app.kubernetes.io/name: sample-api
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/instance: service2
    app.kubernetes.io/name: sample-api
  name: service2
  namespace: platform
spec:
  replicas: 1
  revisionHistoryLimit: 1
  selector:
    matchLabels:
      app.kubernetes.io/instance: service2
      app.kubernetes.io/name: sample-api
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: service2
        app.kubernetes.io/name: sample-api
    spec:
      containers:
      - image: mcr.microsoft.com/dotnet/samples:aspnetapp
        imagePullPolicy: IfNotPresent
        name: sample-api
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
      tolerations:
      - effect: NoSchedule
        key: kubernetes.azure.com/scalesetpriority
        operator: Equal
        value: spot

ppanyukov avatar Aug 17 '23 10:08 ppanyukov

@ppanyukov , as you noted there is a fix for this in new release 3.8. If you could please try that out and let us know if there's improvement.

cindymullins-dw avatar Aug 31 '23 19:08 cindymullins-dw

@ppanyukov the selector field is deprecated in favor of mappingSelector. Testing using mappingSelector on getambassador.io/v3alpha1 on v3.8.0 I was able to get it to route correctly:

2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_ready XFP https -> ROUTE cluster_127_0_0_1_8877_emissary>
2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_ready ALWAYS -> ROUTE cluster_127_0_0_1_8877_emissary>
2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_alive XFP https -> ROUTE cluster_127_0_0_1_8877_emissary>
2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:       route <V3Route {'service1.example.com'}: PFX /ambassador/v0/check_alive ALWAYS -> ROUTE cluster_127_0_0_1_8877_emissary>
2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:       route <V3Route {'service1.example.com'}: PFX / XFP https -> ROUTE cluster_quote_default_default>
2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:       route <V3Route {'service1.example.com'}: PFX / ALWAYS -> ROUTE cluster_quote_default_default>
2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:   chain CHAIN: tls=False hostglobs=['service2.example.com']
2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:     host service2.example.com
2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:       route <V3Route {'service2.example.com'}: PFX /ambassador/v0/check_ready XFP https -> ROUTE cluster_127_0_0_1_8877_emissary>
2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:       route <V3Route {'service2.example.com'}: PFX /ambassador/v0/check_ready ALWAYS -> ROUTE cluster_127_0_0_1_8877_emissary>
2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:       route <V3Route {'service2.example.com'}: PFX /ambassador/v0/check_alive XFP https -> ROUTE cluster_127_0_0_1_8877_emissary>
2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:       route <V3Route {'service2.example.com'}: PFX /ambassador/v0/check_alive ALWAYS -> ROUTE cluster_127_0_0_1_8877_emissary>
2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:       route <V3Route {'service2.example.com'}: PFX / XFP https -> ROUTE cluster_quote_default_default>
2023-09-05 22:46:03 diagd 3.8.0 [P15TAEW] DEBUG:       route <V3Route {'service2.example.com'}: PFX / ALWAYS -> ROUTE cluster_quote_default_default>
Hosts,Mappings
# service1 host
apiVersion: getambassador.io/v3alpha1
kind: Host
metadata:
  name: service1.example.com
spec:
  hostname: service1.example.com
  mappingSelector:
    matchLabels:
      hostKind: host-service1
  requestPolicy:
    insecure:
      action: Route
  # tls:
  #   alpn_protocols: h2,http/1.1
  #   min_tls_version: v1.2
  # tlsSecret:
  #   name: tls-cert
---
# service1 mapping
apiVersion: getambassador.io/v3alpha1
kind: Mapping
metadata:
  labels:
    hostKind: host-service1
  name: quote1
spec:
  prefix: /
  service: quote
---
# service2 Mapping
apiVersion: getambassador.io/v3alpha1
kind: Mapping
metadata:
  labels:
    hostKind: host-service2
  name: quote2
spec:
  prefix: /
  service: quote
---
apiVersion: getambassador.io/v3alpha1
kind: Host
metadata:
  name: service2.example.com
spec:
  hostname: service2.example.com
  mappingSelector:
    matchLabels:
      hostKind: host-service2
  requestPolicy:
    insecure:
      action: Route
  # tls:
  #   alpn_protocols: h2,http/1.1
  #   min_tls_version: v1.2
  # tlsSecret:
  #   name: tls-cert

Using getambassador.io/v2 still seems to result in round robin between the two mappings so will have to look into that further

haq204 avatar Sep 05 '23 22:09 haq204

I believe on getambassador.io/v2, you still need to use selector rather than MappingSelector (which only works on getambassador.io/v3alpha1). What happens if you use selector on your v2 resources?

cindymullins-dw avatar Oct 26 '23 01:10 cindymullins-dw

I was having issues with the same on v3.9.1, and I was able to narrow it down to this:

if the CRD previously has a hostname defined and it's updated to use a label selector, the routing does not work.

I ended up recreating all mappings, and it started working. I will open a separate issue for this

miguelvr avatar Jan 04 '24 11:01 miguelvr