emissary Ingress assigned to wrong loadbalancer when using multiple ambassador installations

Describe the bug I have two ambassadors running. One configured to use an internet facing loadbalancer and one to use an internal loadbalancer. When I create an ingress resource I properly set the annotations but the ambassador id doesn't appear to be honored.

To Reproduce I've deployed them using the public helm charts with the following values:

helm install ambassador-public datawire/ambassador -n ambassador

env:
  AMBASSADOR_ID: public
statsd:
  enabled: true
  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 5

helm install ambassador-private datawire/ambassador -n ambassador

env:
  AMBASSADOR_ID: private
statsd:
  enabled: true
  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 5
service:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"

Create private ingress resource:

--- 
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: ambassador-private-ingress
  namespace: ambassador
  annotations:
    kubernetes.io/ingress.class: ambassador
    getambassador.io/ambassador-id: private
    cert-manager.io/issuer: "letsencrypt-prod"
spec:
  tls:
  - hosts:
    - private.domain.com
    secretName: ambassador-private-ingress-tls
  rules:  
  - host: private.domain.com
    http:
      paths:
      - path: /
        backend:
          serviceName: ambassador-private
          servicePort: 80

Create public ingress resource:

--- 
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: ambassador-public-ingress
  namespace: ambassador
  annotations:
    kubernetes.io/ingress.class: ambassador
    getambassador.io/ambassador-id: public
    cert-manager.io/issuer: "letsencrypt-prod"
spec:
  tls:
  - hosts:
    - public.domain.com
    secretName: ambassador-public-ingress-tls
  rules:  
  - host: public.domain.com
    http:
      paths:
      - path: /
        backend:
          serviceName: ambassador-public
          servicePort: 80

Looking at the services that are running and the associated loadbalancers:

NAME                       TYPE           CLUSTER-IP       EXTERNAL-IP                                                                       PORT(S)                      AGE
ambassador-private         LoadBalancer   172.20.223.158   internal-a9127d5a6970f4cd2xxxx.us-east-1.elb.amazonaws.com   80:32201/TCP,443:30509/TCP   22m
ambassador-public          LoadBalancer   172.20.65.166    af21946eee58cxxx-685179520.us-east-1.elb.amazonaws.com            80:30038/TCP,443:32604/TCP   22m

Expected behavior The private ingress resource should be assigned the loadbalancer internal-a9127d5a6970f4cd2xxxx.us-east-1.elb.amazonaws.com and the public ingress should be assigne the af21946eee58cxxx-685179520.us-east-1.elb.amazonaws.com loadbalancer.

What actually happens is they both get the internal loadbalancer address:

NAMESPACE    NAME                         HOSTS                           ADDRESS                                                                           PORTS     AGE
ambassador   ambassador-private-ingress   private.domain.com   internal-a9127d5a6970f4cd2xxxx.us-east-1.elb.amazonaws.com   80, 443   28m
ambassador   ambassador-public-ingress    public.domain.com    internal-a9127d5a6970f4cd2xxxx.us-east-1.elb.amazonaws.com   80, 443   28m

Versions (please complete the following information):

Ambassador: [1.3.2, 1.4.1]
Kubernetes environment [EKS]
Version [1.15]

Apr 22 '20 01:04 micahlmartin

This appears to be a bug in how ambassador handles ingresses internally. As it is, the actual functionality of ambassador does not actually seem to be affected, for example, even when replicating this issue, when I did curl -H 'Host: public.domain.com' https://<public ip address> I still consistently got the correct service output, which means ambassador's internal mappings are correctly respecting the Ambassador ID. As such, I believe your implementation will work, it's just that the ingress outputs will not correctly reflect what ambassador is actually doing.

What I believe is happening under the hood is that Ambassador is trying to convert the ingress resource into a mapping resource, but this process happens before examining the Ambassador ID. If that is the case, then each ambassador will look at both ingresses and see the one not currently assigned to it as a new update and generate a new mapping for it, causing the ingress resource to reflect the new mapping created and then the mappings are checked for the Ambassador ID and processed accordingly. Unfortunately, this will continuously cycle as this conversion process does not seem to be ID aware at the moment. This behavior can be better observed when looking at the pod logs as well as using kubectl get ingress -w, which will show near-constant updates to the ingresses.

Until the potential bug is nailed down and fixed, the only 'true' solution is to try and convert your ingresses into ambassador mappings, since all of my tests with mappings worked exactly as expected, however I'm not sure at the moment what that would look like using jetstack's cert-manager plugin.

Apr 23 '20 17:04 cakuros

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jun 22 '20 23:06 stale[bot]

I have this exact same bug and ingress keeps cycling between the public and the private LB. @cakuros is there a workaround for this currently? We went with Ambassador with the intent to use Ingress resources for as much configuration as possible, this bug affects us a lot because we're also using external-dns; I discovered the bug because DNS kept flipping.

Nov 19 '20 09:11 0dragosh

I'm going to +1 this for visibility. I'm running into this exact issue whith multiple ambassador ingresses. DNS + routing will cycle which results in 5xx and 4xx returns.

Dec 07 '20 18:12 Phenyx

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Feb 08 '21 03:02 stale[bot]

This shouldn't be stale. It's pretty critical that this gets fixed.

Feb 08 '21 15:02 micahlmartin

I'd also like to +1 this, are others running into this? What versions of Ambassador is this happening in?

Jun 01 '21 18:06 blhagadorn

It looks like this may have been fixed in 1.13.5: https://github.com/datawire/ambassador/pull/3393 https://github.com/datawire/ambassador/blob/master/CHANGELOG.md#1135-may-13-2021

Jun 01 '21 20:06 blhagadorn

so it seems that the "correct" way to have internal and internet-facing traffic patterns is to create separate ambassador installations, public and private (or similar names/ids)?

Sep 13 '22 21:09 ghostsquad

It seems the issue was addressed 1.13.5. If the cycling between the public and the private LB persists on 2.x or 3.x please reopen. Multiple instances of Ambassador do require separate ambassador_ids which need to be applied to all associated resources.

Dec 17 '22 02:12 cindymullins-dw

emissary emissary copied to clipboard

Ingress assigned to wrong loadbalancer when using multiple ambassador installations

emissary
emissary copied to clipboard