traefik-helm-chart icon indicating copy to clipboard operation
traefik-helm-chart copied to clipboard

Traefik helm release upgrade not working with Kubernetes QoS configuration

Open alex321d2 opened this issue 4 years ago • 3 comments

Welcome!

  • [x] Yes, I've searched similar issues on GitHub and didn't find any.
  • [X] Yes, I've searched similar issues on the Traefik community forum and didn't find any.

What version of the Traefik's Helm Chart are you using?

10.7.1

What version of Traefik are you using?

2.5.4

What did you do?

Trying to upgrade an Helm Traefik release with a Guaranted Pod QoS setting (https://github.com/traefik/traefik-helm-chart/blob/master/traefik/).

Commande used to upgrade

# Helm version : v3.7.0
helm upgrade --install traefik traefik/traefik -n -f - values.yaml

What did you see instead?

All the routing is down and not working as expected. The requests returned a 404 error response.

What is your environment & configuration?

## Configure the deployment
deployment:
    enabled: true
    kind: Deployment
    replicas: 3
    annotations: {}
    labels: {}
    podLabels: {}
    #QoS configuration to ensure a full Traefik services live
    qosClass: Guaranteed # KO - Important ! During an Helm upgrade, it involves a lost of routing with 404 errors

Additional Information

To be able to upgrade a new Helm release version we need to downgrade first the Kubernetes 
 pod Quality Of Service to Burstable or BestEffort .
QoS documentation : https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/

alex321d2 avatar Dec 10 '21 09:12 alex321d2

I'm not sure I get it ... are you trying to set a QoS Class into traefik podTemplate? And you've had an outage?

The docs you point out to explains the QoS Class is an attribute for a Pod, that is defined based on the resource requests & limits allocated to that Pod.

Nothing suggests this value may be set as an input to traefik helm chart, or in general. The deployment.qosClass set in the configuration you're showing us would have no effect in what would be deployed by this chart.

Now this does not explain why you are getting 404 errors. I don't really get what's going on. What were you trying to do exactly? Was traefik first deployed with Helm, or are you trying to patch an existing deployment while first using helm? Any change in traefik chart version, traefik image version, or to the values file used deploying traefik - aside from that qosclass param?

Do you have a copy of your previous configuration? For a deployment, there should be an history of configuration changes, kept as replicasets. kubectl get rs -n traefik-namespace. You can dump your current and previous configurations, figuring out what would have changed.

faust64 avatar Dec 11 '21 01:12 faust64

Thanks for your response @faust64 Let me provide you more details :

1° The first step consist in the Traefik installation with Helm :

: kubectl create namespace traefik
: helm upgrade --install traefik-demo traefik/traefik -n traefik -f values.yaml

values.yaml

##  https://github.com/traefik/traefik-helm-chart/blob/master/traefik/values.yaml
# Default values for Traefik

image:
  name: traefik
  tag: '2.5' # https://github.com/traefik/traefik/releases?q=v2&expanded=true
  pullPolicy: IfNotPresent

# For QoS Guaranted
resources:
 requests:
   cpu: "150m"
   memory: "1500Mi"
 limits:
   cpu: "150m"
   memory: "1500Mi"

#
# Configure the deployment
#
deployment:
  enabled: true
  kind: Deployment
  # QoS configuration to ensure a full Traefik services live
  qosClass: Guaranteed

# Pod disruption budget
podDisruptionBudget:
  enabled: false

# Create an IngressRoute for the dashboard
ingressRoute:
  dashboard:
    enabled: true

rollingUpdate:
  maxUnavailable: 0
  maxSurge: 1

#
# Configure providers
#
providers:
  kubernetesCRD:
    enabled: true
    allowCrossNamespace: false
    namespaces: []
    # - "default"
  kubernetesIngress:
    enabled: true
    publishedService:
      enabled: false


# Logs
# https://docs.traefik.io/observability/logs/
logs:
  # Traefik logs concern everything that happens to Traefik itself (startup, configuration, events, shutdown, and so on).
  general:
    # By default, the logs use a text format (common), but you can
    # also ask for the json format in the format option
    format: json
    # By default, the level is set to ERROR. Alternative logging levels are DEBUG, PANIC, FATAL, ERROR, WARN, and INFO.
    level: DEBUG
  access:
    # To enable access logs
    enabled: true
    level: DEBUG
    # By default, logs are written using the Common Log Format (CLF).
    # To write logs in JSON, use json in the format option.
    # If the given format is unsupported, the default (CLF) is used instead.
    format: json
    fields:
      defaultMode: keep
      names:
        ClientUsername: keep
      headers:
        defaultMode: keep
        names:
          User-Agent: keep
          Authorization: keep
          Content-Type: keep

# Configure ports
ports:
  # The name of this one can't be changed as it is used for the readiness and
  # liveness probes, but you can adjust its config to your liking
  traefik:
    port: 9000
    expose: true
    # The exposed port for this service
    exposedPort: 9000
    # The port protocol (TCP/UDP)
    protocol: TCP
  external:
    port: 8000
    expose: true
    exposedPort: 80
    protocol: TCP
    nodePort: 31180
  internal:
    port: 8443
    expose: true
    exposedPort: 443
    protocol: TCP
    nodePort: 31080
  web: null
  websecure: null
  metrics:
    port: 9100
    # hostPort: 9100
    # Defines whether the port is exposed if service.type is LoadBalancer or NodePort.
    expose: false
    # The exposed port for this service
    exposedPort: 9100
    # The port protocol (TCP/UDP)
    protocol: TCP

# Options for the main traefik service, where the entrypoints traffic comes
# from.
service:
  enabled: true
  type: NodePort


# Whether Role Based Access Control objects like roles and rolebindings should be created
rbac:
  enabled: true
namespace
  namespaced: false

If you check now in you traefik pod you should see the qosClass correctly configured :

: kubectl -n traefik get pod
: kubectl -n traefik describe pod pod-name
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  
  traefik-demo-token-chq72:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  traefik-demo-token-chq72
    Optional:    false
QoS Class:      Guaranteed

2° Update the helm release by changing anything in the Helm values. Change the number of replicat :

#
# Configure the deployment
#
deployment:
  enabled: true
  # Can be either Deployment or DaemonSet
  kind: Deployment
  # Number of pods of the deployment (only applies when kind == Deployment)
  replicas: 3
: helm upgrade --install traefik-demo traefik/traefik -n traefik -f values.yaml

3° After a successfull Helm deployment, all the routing return a 404 error response.

Nevertheless, I assum now this is maybe something not yet implemented from Traefik side ?

Hope that helps

alex321d2 avatar Dec 15 '21 08:12 alex321d2

I'm not sure what could be on. I would check on Traefik pods logs, make sure the client queries are properly redirected to traefik, check the .status field of Ingresses ... check for Traefik image versions before and after, look for a diff in ReplicaSets between previous and current version.

For sure, the deployment.qosClass you mentioned isn't involved / won't have any effect on the objects generated by that Chart.

I assume now this is maybe something not yet implemented from Traefik side ?

qosClass is a computed field, that would show in your Pod status (not its spec). It is assigned to you ( https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#qos-classes ), based on the resource limits and requests you may have set on the containers in your Pod ( https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed / https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-burstable / https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-besteffort ) At no point, you would be able to set one yourself. This isn't a Traefik limitation.

The Traefik Helm chart would allow you to change the replica count of an existing deployment. Or to update the Traefik image/version being used. There is no reason it would suddenly return with 404.

faust64 avatar Dec 19 '21 21:12 faust64

Since it seems there is nothing to do on the helm chart about this, I close this issue. Please re-open it if you have found something we can do to help on the helm chart for your issue.

mloiseleur avatar Oct 13 '22 15:10 mloiseleur