Traefik helm release upgrade not working with Kubernetes QoS configuration
Welcome!
- [x] Yes, I've searched similar issues on GitHub and didn't find any.
- [X] Yes, I've searched similar issues on the Traefik community forum and didn't find any.
What version of the Traefik's Helm Chart are you using?
10.7.1
What version of Traefik are you using?
2.5.4
What did you do?
Trying to upgrade an Helm Traefik release with a Guaranted Pod QoS setting (https://github.com/traefik/traefik-helm-chart/blob/master/traefik/).
Commande used to upgrade
# Helm version : v3.7.0 helm upgrade --install traefik traefik/traefik -n-f - values.yaml
What did you see instead?
All the routing is down and not working as expected. The requests returned a 404 error response.
What is your environment & configuration?
## Configure the deployment
deployment:
enabled: true
kind: Deployment
replicas: 3
annotations: {}
labels: {}
podLabels: {}
#QoS configuration to ensure a full Traefik services live
qosClass: Guaranteed # KO - Important ! During an Helm upgrade, it involves a lost of routing with 404 errors
Additional Information
To be able to upgrade a new Helm release version we need to downgrade first the Kubernetes
pod Quality Of Service to Burstable or BestEffort .
QoS documentation : https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/
I'm not sure I get it ... are you trying to set a QoS Class into traefik podTemplate? And you've had an outage?
The docs you point out to explains the QoS Class is an attribute for a Pod, that is defined based on the resource requests & limits allocated to that Pod.
Nothing suggests this value may be set as an input to traefik helm chart, or in general.
The deployment.qosClass set in the configuration you're showing us would have no effect in what would be deployed by this chart.
Now this does not explain why you are getting 404 errors. I don't really get what's going on. What were you trying to do exactly? Was traefik first deployed with Helm, or are you trying to patch an existing deployment while first using helm? Any change in traefik chart version, traefik image version, or to the values file used deploying traefik - aside from that qosclass param?
Do you have a copy of your previous configuration?
For a deployment, there should be an history of configuration changes, kept as replicasets. kubectl get rs -n traefik-namespace. You can dump your current and previous configurations, figuring out what would have changed.
Thanks for your response @faust64 Let me provide you more details :
1° The first step consist in the Traefik installation with Helm :
: kubectl create namespace traefik : helm upgrade --install traefik-demo traefik/traefik -n traefik -f values.yaml
values.yaml
## https://github.com/traefik/traefik-helm-chart/blob/master/traefik/values.yaml
# Default values for Traefik
image:
name: traefik
tag: '2.5' # https://github.com/traefik/traefik/releases?q=v2&expanded=true
pullPolicy: IfNotPresent
# For QoS Guaranted
resources:
requests:
cpu: "150m"
memory: "1500Mi"
limits:
cpu: "150m"
memory: "1500Mi"
#
# Configure the deployment
#
deployment:
enabled: true
kind: Deployment
# QoS configuration to ensure a full Traefik services live
qosClass: Guaranteed
# Pod disruption budget
podDisruptionBudget:
enabled: false
# Create an IngressRoute for the dashboard
ingressRoute:
dashboard:
enabled: true
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
#
# Configure providers
#
providers:
kubernetesCRD:
enabled: true
allowCrossNamespace: false
namespaces: []
# - "default"
kubernetesIngress:
enabled: true
publishedService:
enabled: false
# Logs
# https://docs.traefik.io/observability/logs/
logs:
# Traefik logs concern everything that happens to Traefik itself (startup, configuration, events, shutdown, and so on).
general:
# By default, the logs use a text format (common), but you can
# also ask for the json format in the format option
format: json
# By default, the level is set to ERROR. Alternative logging levels are DEBUG, PANIC, FATAL, ERROR, WARN, and INFO.
level: DEBUG
access:
# To enable access logs
enabled: true
level: DEBUG
# By default, logs are written using the Common Log Format (CLF).
# To write logs in JSON, use json in the format option.
# If the given format is unsupported, the default (CLF) is used instead.
format: json
fields:
defaultMode: keep
names:
ClientUsername: keep
headers:
defaultMode: keep
names:
User-Agent: keep
Authorization: keep
Content-Type: keep
# Configure ports
ports:
# The name of this one can't be changed as it is used for the readiness and
# liveness probes, but you can adjust its config to your liking
traefik:
port: 9000
expose: true
# The exposed port for this service
exposedPort: 9000
# The port protocol (TCP/UDP)
protocol: TCP
external:
port: 8000
expose: true
exposedPort: 80
protocol: TCP
nodePort: 31180
internal:
port: 8443
expose: true
exposedPort: 443
protocol: TCP
nodePort: 31080
web: null
websecure: null
metrics:
port: 9100
# hostPort: 9100
# Defines whether the port is exposed if service.type is LoadBalancer or NodePort.
expose: false
# The exposed port for this service
exposedPort: 9100
# The port protocol (TCP/UDP)
protocol: TCP
# Options for the main traefik service, where the entrypoints traffic comes
# from.
service:
enabled: true
type: NodePort
# Whether Role Based Access Control objects like roles and rolebindings should be created
rbac:
enabled: true
namespace
namespaced: false
If you check now in you traefik pod you should see the qosClass correctly configured :
: kubectl -n traefik get pod : kubectl -n traefik describe pod pod-name
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
traefik-demo-token-chq72:
Type: Secret (a volume populated by a Secret)
SecretName: traefik-demo-token-chq72
Optional: false
QoS Class: Guaranteed
2° Update the helm release by changing anything in the Helm values. Change the number of replicat :
# # Configure the deployment # deployment: enabled: true # Can be either Deployment or DaemonSet kind: Deployment # Number of pods of the deployment (only applies when kind == Deployment) replicas: 3
: helm upgrade --install traefik-demo traefik/traefik -n traefik -f values.yaml
3° After a successfull Helm deployment, all the routing return a 404 error response.
Nevertheless, I assum now this is maybe something not yet implemented from Traefik side ?
Hope that helps
I'm not sure what could be on. I would check on Traefik pods logs, make sure the client queries are properly redirected to traefik, check the .status field of Ingresses ... check for Traefik image versions before and after, look for a diff in ReplicaSets between previous and current version.
For sure, the deployment.qosClass you mentioned isn't involved / won't have any effect on the objects generated by that Chart.
I assume now this is maybe something not yet implemented from Traefik side ?
qosClass is a computed field, that would show in your Pod status (not its spec). It is assigned to you ( https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#qos-classes ), based on the resource limits and requests you may have set on the containers in your Pod ( https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed / https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-burstable / https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-besteffort ) At no point, you would be able to set one yourself. This isn't a Traefik limitation.
The Traefik Helm chart would allow you to change the replica count of an existing deployment. Or to update the Traefik image/version being used. There is no reason it would suddenly return with 404.
Since it seems there is nothing to do on the helm chart about this, I close this issue. Please re-open it if you have found something we can do to help on the helm chart for your issue.