linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

Traefik Router unable to communicate with meshed services when linkerd inbound policy is all-authenticated.

Open palashbasik opened this issue 1 year ago • 6 comments

What is the issue?

I installed linkerd via helm chart in on-prem K3S cluster in linkerd namespace. I am using traefik ingress-controller which is deployed in traefik namespace. I have few microservices deployed in default namespace. I configured traefik router to access microservices from outside of cluster. eg: Traefik Router

routers:
  example-service:
  entryPoints:
    - websecure
  rule: "Host(`example.app.com`) && PathPrefix(`/`)"
  tls:
    certResolver: leresolver
  service: example-service
services:
  example-service: 
    loadBalancer:
      servers:
        - url: http://example-service.default.svc.cluster.local:8000

When linkerd deployed with defaultInboundPolicy: "all-unauthenticated", I can access all the microservices from browser.

proxy:
  defaultInboundPolicy: "all-unauthenticated"

But, When deployed with defaultInboundPolicy: "all-authenticated", I can't access microservices from browser.

I am new to linkerd service mesh. I am unsure of the problem mentioned above.

How can it be reproduced?

  1. Provision K3S cluster.
  2. Install traefik in traefik namespace and annotate.
deployment:
  podAnnotations:
    linkerd.io/inject: ingress
  1. Install linkerd in linkerd namespace with below values.
proxy:
  defaultInboundPolicy: "all-authenticated"
  1. Deploy an application in default namespace.
  2. Annotate default namespace with linkerd.io/inject=enabled
kubectl annotate namespace default linkerd.io/inject=enabled 
  1. To inject the Linkerd sidecar, restart the pod in the default namespace.
  2. Create router in traefik in values.yaml.
routers:
  example-service:
  entryPoints:
    - websecure
  rule: "Host(`example.app.com`) && PathPrefix(`/`)"
  tls:
    certResolver: leresolver
  service: example-service
services:
  example-service: 
    loadBalancer:
      servers:
        - url: http://example-service.default.svc.cluster.local:8000
  1. In browser, try to access example.app.com I can't access the application.

Logs, error output, etc

logs from traefik linkerd-proxy container

[  3083.814154s]  INFO ThreadId(01) inbound:server{port=8443}: linkerd_app_inbound::policy::tcp: Connection denied server.group= server.kind=default server.name=all-authenticated tls=Some(Passthru { sni: ServerId(Name("example.app.com")) }) client=10.42.0.1:54877
[  3083.814178s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on default/all-authenticated client.addr=10.42.0.1:54877 server.addr=10.42.0.58:8443

output of linkerd check -o short

linkerd-identity
----------------
‼ issuer cert is valid for at least 60 days
    issuer certificate will expire on 2024-06-21T09:18:37Z
    see https://linkerd.io/2/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    unsupported version channel: stable-2.14.10
    see https://linkerd.io/2/checks/#l5d-version-control for hints
‼ control plane and cli versions match
    control plane running stable-2.14.10 but cli running edge-24.6.2
    see https://linkerd.io/2/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
        * linkerd-identity-6dbb555cf7-q9c8g (stable-2.14.10)
        * metrics-api-b85485b99-2cbkt (stable-2.14.10)
        * web-58979b9448-72znj (stable-2.14.10)
        * tap-injector-c48598d4c-chc25 (stable-2.14.10)
        * tap-7999d688ff-kgzqh (stable-2.14.10)
        * linkerd-proxy-injector-7f6964c9b9-fx8vx (stable-2.14.10)
        * linkerd-destination-5dc7694bc5-t4glt (stable-2.14.10)
    see https://linkerd.io/2/checks/#l5d-cp-proxy-version for hints
‼ control plane proxies and cli versions match
    linkerd-identity-6dbb555cf7-q9c8g running stable-2.14.10 but cli running edge-24.6.2
    see https://linkerd.io/2/checks/#l5d-cp-proxy-cli-version for hints

linkerd-viz
-----------
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
        * linkerd-identity-6dbb555cf7-q9c8g (stable-2.14.10)
        * metrics-api-b85485b99-2cbkt (stable-2.14.10)
        * web-58979b9448-72znj (stable-2.14.10)
        * tap-injector-c48598d4c-chc25 (stable-2.14.10)
        * tap-7999d688ff-kgzqh (stable-2.14.10)
        * linkerd-proxy-injector-7f6964c9b9-fx8vx (stable-2.14.10)
        * linkerd-destination-5dc7694bc5-t4glt (stable-2.14.10)
    see https://linkerd.io/2/checks/#l5d-viz-proxy-cp-version for hints
‼ viz extension proxies and cli versions match
    linkerd-identity-6dbb555cf7-q9c8g running stable-2.14.10 but cli running edge-24.6.2
    see https://linkerd.io/2/checks/#l5d-viz-proxy-cli-version for hints

Status check results are √

Environment

$ linkerd version Client version: edge-24.6.2 Server version: stable-2.14.10

$ helm version version.BuildInfo{Version:"v3.11.3", GitCommit:"323249351482b3bbfc9f5004f65d400aa70f9ae7", GitTreeState:"clean", GoVersion:"go1.20.3"}

$ kubectl version --short Client Version: v1.27.1 Kustomize Version: v5.0.1 Server Version: v1.25.6+k3s1

Cluster type: Single node on-prem K3S

Ingress Controller: Traefik v2.9.8

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

palashbasik avatar Jun 19 '24 17:06 palashbasik

You'll need some extra config to get Traefik to play nice with Linkerd. Please check the detailed instructions in the docs

alpeb avatar Jun 20 '24 13:06 alpeb

I tried adding middleware but still the same problem.

Below is the snippet of traefik configmap.

http:
  middlewares:
    l5d-header:
      headers:
        customRequestHeaders:
          l5d-dst-override: "example-service.default.svc.cluster.local:8000"  
  routers:
    example-service:
    entryPoints:
      - websecure
    rule: "Host(`example.app.com`) && PathPrefix(`/`)"
    middleware:
      - l5d-header
    tls:
      certResolver: leresolver
    service: example-service
  services:
    example-service: 
      loadBalancer:
        servers:
          - url: http://example-service.default.svc.cluster.local:8000          
 

Note: I am using letsencrypt certResolver in traefik for TLS.

palashbasik avatar Jun 21 '24 06:06 palashbasik

@palashbasik Have you meshed Traefik using linkerd.io/inject: ingress? It's not hard to miss that bit in our docs for Traefik v2... 😐

kflynn avatar Jun 27 '24 15:06 kflynn

@palashbasik I see that you listed that you're using ingress mode above, it's worth doublechecking. 🙂 But also: instead of the Traefik configmap, can we see the YAML you're configuring Traefik with?

kflynn avatar Jun 27 '24 16:06 kflynn

Below is the override-values.yaml file for Traefik.

deployment:
  replicas: 3
  podAnnotations:
    linkerd.io/inject: ingress
# Pod disruption budget
podDisruptionBudget:
  enabled: true
  # maxUnavailable: 1
  # maxUnavailable: 33%
  minAvailable: 1
  # minAvailable: 25%

# Enable experimental features
experimental:
  v3:
    enabled: true
  plugins:
    enabled: true

# Create an IngressRoute for the dashboard
ingressRoute:
  dashboard:
    enabled: true

## Logs
## https://docs.traefik.io/observability/logs/
logs:
  ## Traefik logs concern everything that happens to Traefik itself (startup, configuration, events, shutdown, and so on).
  general:
    # By default, the logs use a text format (common), but you can also ask for the json format in the format option
    # format: json
    # By default, the level is set to ERROR.
    # Alternative logging levels are DEBUG, PANIC, FATAL, ERROR, WARN, and INFO.
    level: INFO
  access:
    # To enable access logs
    enabled: true
    ## By default, logs are written using the Common Log Format (CLF) on stdout.
    ## To write logs in JSON, use json in the format option.
    format: json
    # filePath: "/var/log/traefik/access.log
    ## To write the logs in an asynchronous fashion, specify a bufferingSize option.
    ## This option represents the number of log lines Traefik will keep in memory before writing
    ## them to the selected output. In some cases, this option can greatly help performances.
    # bufferingSize: 100
    ## Filtering https://docs.traefik.io/observability/access-logs/#filtering
    filters: {}
      # statuscodes: "200,300-302"
      # retryattempts: true
      # minduration: 10ms
    ## Fields
    ## https://docs.traefik.io/observability/access-logs/#limiting-the-fieldsincluding-headers
    fields:
      general:
        defaultmode: keep
        names:
          StartUTC: drop 	
          StartLocal: drop 	 
          RouterName: drop 		
          ServiceAddr: drop  
          ClientPort: drop 
          ClientUsername: drop 	 	
          RequestHost: drop 
          RequestPort: drop 	
          RequestMethod: drop 
          RequestPath: drop	
          RequestProtocol: drop 
          RequestScheme: drop  
          RequestContentSize: drop 	
          OriginDuration: drop 	
          OriginContentSize: drop 	
          OriginStatus: drop 	
          OriginStatusLine: drop 	 	
          DownstreamStatusLine: drop 	 	
          RequestCount: drop 	
          GzipRatio: drop 
          Overhead: drop  	
          TLSVersion: drop 
          TLSCipher: drop  

metrics:
  ## Prometheus is enabled by default.
  ## It can be disabled by setting "prometheus: null"
  prometheus:
    ## Entry point used to expose metrics.
    entryPoint: metrics
    addEntryPointsLabels: true
    addRoutersLabels: true
    addServicesLabels: true
    ## Buckets for latency metrics. Default="0.1,0.3,1.2,5.0"
    # buckets: "0.5,1.0,2.5"
    ## When manualRouting is true, it disables the default internal router in
    ## order to allow creating a custom router for prometheus@internal service.
    # manualRouting: true
  
tracing:
  jaeger:
    collector:
      endpoint: http://jaeger-collector.monitoring.svc.cluster.local:14268/api/traces

secret:
  enabled: true 

# Environment variables to be passed to Traefik's binary
env: 
  - name: CLOUDFLARE_EMAIL
    value: <your-email-id>
  - name: CLOUDFLARE_API_KEY
    valueFrom:
      secretKeyRef:
        name: traefik-secret
        key: CLOUDFLARE_API_KEY


# Configure ports
ports:
  web:
    expose: false           
  websecure:
    # Enable this entrypoint as a default entrypoint. When a service doesn't explicity set an entrypoint it will only use this entrypoint.
    # asDefault: true
    tls:
      enabled: true
      # this is the name of a TLSOption definition
      # options: ""
      certResolver: "leresolver"
      # domains: []    

## Create HorizontalPodAutoscaler object.
##
autoscaling:
  enabled: true
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1
        periodSeconds: 60

# Enable persistence using Persistent Volume Claims
# ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
# It can be used to store TLS certificates, see `storage` in certResolvers
persistence:
  enabled: true

certResolvers: 
  leresolver:
    # for challenge options cf. https://doc.traefik.io/traefik/https/acme/
    email: <your-email-id>
    dnsChallenge:
      # also add the provider's required configuration under env
      # or expand then from secrets/configmaps with envfrom
      # cf. https://doc.traefik.io/traefik/https/acme/#providers
      provider: cloudflare
      # add futher options for the dns challenge as needed
      # cf. https://doc.traefik.io/traefik/https/acme/#dnschallenge
      delayBeforeCheck: 30
      resolvers:
        - 1.1.1.1
        - 8.8.8.8
    tlsChallenge: false
    # httpChallenge:
    #   entryPoint: "web"
    # It has to match the path with a persistent volume
    storage: /data/acme.json

additionalArguments:
  - "--providers.file.filename=/config/config.yaml"
volumes:
  - name: '{{ printf "%s-configs" .Release.Name }}'
    mountPath: '/config'
    type: configMap

resources:
  requests:
    cpu: "100m"
    memory: "1Gi"
  limits:
    cpu: "500m"
    memory: "2Gi"

config: |-
  http:
    middlewares:
      corsHeader:
        headers:
          accessControlAllowCredentials: true
          accessControlAllowHeaders: 
          - Accept
          - Access-Control-Request-Headers 
          - Access-Control-Request-Method 
          - Authorization 
          - Content-Type 
          - Last-Modified 
          - Origin 
          - X-Requested-With
          - Sec-WebSocket-Key
          accessControlAllowMethods: "*"
          accessControlAllowOriginList: 
          - http://localhost:3000              
          accessControlMaxAge: 100
          addVaryHeader: true
      basic-admin-auth:
        basicAuth:
          users:
            # password - password - hashed with bcrypt
            - "admin:$2a$12$fpgiRwj7e2XBv/U4LWDvr.Jr7sRPECklDxitBdXDkBzLS6r4TU5Pm"
      strip-service-prefix:
        # Modifies "/team/hello" to "/hello"
        replacePathRegex:
          regex: '^/$1/$1/(.*)'
          #regex: '^/.*?/(.*)'
          replacement: '/$1'             
    routers:
      example-service:
        entryPoints:
          - websecure
        # Should prevent any route containing the word "internal" to be blocked
        rule: "Host(`example.app.com`) && PathPrefix(`/`)"
        middlewares:
          - strip-service-prefix
        tls:
          certResolver: leresolver          
        service: example-service
    services:
      # Define how to reach an existing service on our infrastructure
       example-service:
        loadBalancer:
          servers:
              - url: http://example-service.default.svc.cluster.local:8000            

With the provided Traefik configuration and Linkerd deployed with the defaultInboundPolicy set to "all-authenticated", I can't access https://example.app.com from browser.

Note: The host example.app.com mentioned above is solely for illustrative purposes.

palashbasik avatar Jun 28 '24 11:06 palashbasik

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 29 '24 14:09 stale[bot]

@palashbasik Have you meshed Traefik using linkerd.io/inject: ingress? It's not hard to miss that bit in our docs for Traefik v2... 😐

https://linkerd.io/2.16/tasks/using-ingress/#traefik-normal-mode says Traefik v2+ to not use that annotation

brandonros avatar Nov 12 '24 03:11 brandonros