traefik-helm-chart icon indicating copy to clipboard operation
traefik-helm-chart copied to clipboard

[K8s / Helm] Traefik starts, then shuts down immediately

Open kieranajp opened this issue 4 years ago • 3 comments
trafficstars

Do you want to request a feature or report a bug?

:bug:

What did you do?

I am running Traefik in K8s, using the official Helm chart. I am using chart version 9.19.1, if that helps.

My aim is to run Traefik as a DaemonSet, bound to the host network. Here's the values.yaml I'm using to achieve that:

deployment:
  enabled: true
  kind: DaemonSet

service:
  enabled: true
  type: NodePort

ports:
  traefik:
    port: 9000
    hostPort: 9000
    expose: false
    exposedPort: 9000
  web:
    port: 80
    hostPort: 80

hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet

additionalArguments:
  - --log.level=DEBUG

securityContext:
  capabilities:
    drop: [ALL]
    add: [NET_BIND_SERVICE]
  readOnlyRootFilesystem: true
  runAsGroup: 0
  runAsNonRoot: false
  runAsUser: 0

What did you expect to see?

I expect my Traefik pods to boot, one per Node.

What did you see instead?

The pods start, but never become ready before immediately stopping. As this happens repeatedly, the pods are put into CrashLoopBackOff state.

What version of Traefik are you using?

Traefik 2.4.8 - https://github.com/traefik/traefik-helm-chart/blob/v9.19.1/traefik/Chart.yaml Kubernetes 1.20 on DigitalOcean's managed offering

If applicable, please paste the log output in DEBUG level (--log.level=DEBUG switch)

$ kubectl logs -f traefik-knvc8 

time="2021-05-21T10:49:21Z" level=info msg="Configuration loaded from flags."
time="2021-05-21T10:49:22Z" level=info msg="Traefik version 2.4.8 built on 2021-03-23T15:48:39Z"
time="2021-05-21T10:49:22Z" level=debug msg="Static configuration loaded {\"global\":{\"checkNewVersion\":true,\"sendAnonymousUsage\":true},\"serversTransport\":{\"maxIdleConnsPerHost\":200},\"entryPoints\":{\"traefik\":{\"address\":\":9000/tcp\",\"transport\":{\"lifeCycle\":{\"graceTimeOut\":10000000000},\"respondingTimeouts\":{\"idleTimeout\":180000000000}},\"forwardedHeaders\":{},\"http\":{}},\"web\":{\"address\":\":80/tcp\",\"transport\":{\"lifeCycle\":{\"graceTimeOut\":10000000000},\"respondingTimeouts\":{\"idleTimeout\":180000000000}},\"forwardedHeaders\":{},\"http\":{}},\"websecure\":{\"address\":\":8443/tcp\",\"transport\":{\"lifeCycle\":{\"graceTimeOut\":10000000000},\"respondingTimeouts\":{\"idleTimeout\":180000000000}},\"forwardedHeaders\":{},\"http\":{}}},\"providers\":{\"providersThrottleDuration\":2000000000,\"kubernetesIngress\":{},\"kubernetesCRD\":{\"allowCrossNamespace\":true}},\"api\":{\"dashboard\":true},\"metrics\":{\"prometheus\":{\"buckets\":[0.1,0.3,1.2,5],\"addEntryPointsLabels\":true,\"addServicesLabels\":true,\"entryPoint\":\"traefik\"}},\"ping\":{\"entryPoint\":\"traefik\",\"terminatingStatusCode\":503},\"log\":{\"level\":\"DEBUG\",\"format\":\"common\"},\"pilot\":{\"dashboard\":true}}"
time="2021-05-21T10:49:22Z" level=info msg="Stats collection is enabled."
time="2021-05-21T10:49:22Z" level=info msg="Many thanks for contributing to Traefik's improvement by allowing us to receive anonymous information from your configuration."
time="2021-05-21T10:49:22Z" level=info msg="Help us improve Traefik by leaving this feature on :)"
time="2021-05-21T10:49:22Z" level=info msg="More details on: https://doc.traefik.io/traefik/contributing/data-collection/"
time="2021-05-21T10:49:22Z" level=debug msg="Configured Prometheus metrics" metricsProviderName=prometheus
time="2021-05-21T10:49:22Z" level=info msg="Starting provider aggregator.ProviderAggregator {}"
time="2021-05-21T10:49:22Z" level=debug msg="Start TCP Server" entryPointName=web
time="2021-05-21T10:49:22Z" level=debug msg="Start TCP Server" entryPointName=websecure
time="2021-05-21T10:49:22Z" level=debug msg="Start TCP Server" entryPointName=traefik
time="2021-05-21T10:49:22Z" level=info msg="Starting provider *ingress.Provider {}"
time="2021-05-21T10:49:22Z" level=info msg="ingress label selector is: \"\"" providerName=kubernetes
time="2021-05-21T10:49:22Z" level=info msg="Creating in-cluster Provider client" providerName=kubernetes
time="2021-05-21T10:49:22Z" level=info msg="Starting provider *traefik.Provider {}"
time="2021-05-21T10:49:22Z" level=info msg="Starting provider *acme.ChallengeTLSALPN {\"Timeout\":4000000000}"
time="2021-05-21T10:49:22Z" level=info msg="Starting provider *crd.Provider {\"allowCrossNamespace\":true}"
time="2021-05-21T10:49:22Z" level=info msg="label selector is: \"\"" providerName=kubernetescrd
time="2021-05-21T10:49:22Z" level=info msg="Creating in-cluster Provider client" providerName=kubernetescrd
time="2021-05-21T10:49:22Z" level=debug msg="Configuration received from provider internal: {\"http\":{\"routers\":{\"ping\":{\"entryPoints\":[\"traefik\"],\"service\":\"ping@internal\",\"rule\":\"PathPrefix(`/ping`)\",\"priority\":2147483647},\"prometheus\":{\"entryPoints\":[\"traefik\"],\"service\":\"prometheus@internal\",\"rule\":\"PathPrefix(`/metrics`)\",\"priority\":2147483647}},\"services\":{\"api\":{},\"dashboard\":{},\"noop\":{},\"ping\":{},\"prometheus\":{}},\"serversTransports\":{\"default\":{\"maxIdleConnsPerHost\":200}}},\"tcp\":{},\"tls\":{}}" providerName=internal
time="2021-05-21T10:49:22Z" level=warning msg="Cross-namespace reference between IngressRoutes and resources is enabled, please ensure that this is expected (see AllowCrossNamespace option)" providerName=kubernetescrd
time="2021-05-21T10:49:22Z" level=debug msg="Added outgoing tracing middleware prometheus@internal" routerName=prometheus@internal middlewareName=tracing middlewareType=TracingForwarder entryPointName=traefik
time="2021-05-21T10:49:22Z" level=debug msg="Added outgoing tracing middleware ping@internal" routerName=ping@internal entryPointName=traefik middlewareName=tracing middlewareType=TracingForwarder
time="2021-05-21T10:49:23Z" level=debug msg="Creating middleware" entryPointName=traefik middlewareType=Recovery middlewareName=traefik-internal-recovery
time="2021-05-21T10:49:23Z" level=debug msg="Creating middleware" middlewareType=Metrics entryPointName=traefik middlewareName=metrics-entrypoint
time="2021-05-21T10:49:23Z" level=debug msg="Creating middleware" middlewareName=metrics-entrypoint middlewareType=Metrics entryPointName=web
time="2021-05-21T10:49:23Z" level=debug msg="Creating middleware" middlewareName=metrics-entrypoint middlewareType=Metrics entryPointName=websecure
time="2021-05-21T10:49:23Z" level=debug msg="Creating middleware" middlewareName=metrics-entrypoint middlewareType=Metrics entryPointName=traefik
time="2021-05-21T10:49:23Z" level=debug msg="Creating middleware" entryPointName=web middlewareName=metrics-entrypoint middlewareType=Metrics
time="2021-05-21T10:49:23Z" level=debug msg="Creating middleware" entryPointName=websecure middlewareName=metrics-entrypoint middlewareType=Metrics
time="2021-05-21T10:49:23Z" level=debug msg="No default certificate, generating one"
W0521 10:49:25.699396       1 warnings.go:70] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
W0521 10:49:31.098996       1 warnings.go:70] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
time="2021-05-21T10:49:32Z" level=debug msg="Configuration received from provider kubernetescrd: {\"http\":{\"routers\":<sniiiiiip>}}},\"tcp\":{},\"udp\":{},\"tls\":{}}" providerName=kubernetescrd
I0521 10:49:32.899022       1 trace.go:205] Trace[683024728]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167 (21-May-2021 10:49:22.403) (total time: 10495ms):
Trace[683024728]: ---"Objects listed" 10495ms (10:49:00.898)
Trace[683024728]: [10.49544069s] [10.49544069s] END
W0521 10:49:33.498604       1 warnings.go:70] networking.k8s.io/v1beta1 IngressClass is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 IngressClassList
W0521 10:49:33.698683       1 warnings.go:70] networking.k8s.io/v1beta1 IngressClass is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 IngressClassList
time="2021-05-21T10:49:33Z" level=debug msg="Configuration received from provider kubernetes: {\"http\":{},\"tcp\":{}}" providerName=kubernetes
time="2021-05-21T10:49:43Z" level=info msg="I have to go..."
time="2021-05-21T10:49:43Z" level=info msg="Stopping server gracefully"
time="2021-05-21T10:49:43Z" level=debug msg="Waiting 10s seconds before killing connections." entryPointName=websecure
time="2021-05-21T10:49:43Z" level=debug msg="Waiting 10s seconds before killing connections." entryPointName=traefik
time="2021-05-21T10:49:43Z" level=debug msg="Waiting 10s seconds before killing connections." entryPointName=web
time="2021-05-21T10:49:43Z" level=error msg="accept tcp [::]:8443: use of closed network connection" entryPointName=websecure
time="2021-05-21T10:49:43Z" level=error msg="Error while starting server: accept tcp [::]:8443: use of closed network connection" entryPointName=websecure
time="2021-05-21T10:49:43Z" level=error msg="accept tcp [::]:9000: use of closed network connection" entryPointName=traefik
time="2021-05-21T10:49:43Z" level=error msg="Error while starting server: accept tcp [::]:9000: use of closed network connection" entryPointName=traefik
time="2021-05-21T10:49:43Z" level=error msg="accept tcp [::]:80: use of closed network connection" entryPointName=web
time="2021-05-21T10:49:43Z" level=error msg="Error while starting server: accept tcp [::]:80: use of closed network connection" entryPointName=web
time="2021-05-21T10:49:43Z" level=error msg="Error while starting server: http: Server closed" entryPointName=websecure
time="2021-05-21T10:49:43Z" level=error msg="Error while starting server: http: Server closed" entryPointName=traefik
time="2021-05-21T10:49:43Z" level=error msg="Error while starting server: http: Server closed" entryPointName=web
time="2021-05-21T10:49:43Z" level=debug msg="Entry point traefik closed" entryPointName=traefik
time="2021-05-21T10:49:43Z" level=debug msg="Entry point websecure closed" entryPointName=websecure
time="2021-05-21T10:49:43Z" level=debug msg="Entry point web closed" entryPointName=web
time="2021-05-21T10:49:43Z" level=info msg="Server stopped"
time="2021-05-21T10:49:43Z" level=info msg="Shutting down"
time="2021-05-21T10:49:47Z" level=debug msg="No default certificate, generating one"
panic: Timeout while stopping traefik, killing instance ✝

goroutine 372 [running]:
github.com/traefik/traefik/v2/pkg/server.(*Server).Close.func1(0x3922008, 0xc000ca5260)
	/go/src/github.com/traefik/traefik/pkg/server/server.go:92 +0x114
created by github.com/traefik/traefik/v2/pkg/server.(*Server).Close
	/go/src/github.com/traefik/traefik/pkg/server/server.go:87 +0x85

kieranajp avatar May 21 '21 10:05 kieranajp

Just to add, I have also been able to reproduce this behaviour with an older version of Traefik (just by changing images.tag in values.yaml - I tried 2.2.8 because I have got that version working before in the past, but no luck here).

I then tried switching to an older version of the chart - version 9.12.0, which uses Traefik 2.3.3, and was the first version to support Kind: DaemonSet. Bizarrely, with this version, the pods restart a few times (with the same log lines), but then seem to enter a stable state rather than a crash loop:

Every 2.0s: kubectl get pod 

NAME            READY   STATUS    RESTARTS   AGE
traefik-8sd4v   1/1     Running   3          4m59s
traefik-hxjls   1/1     Running   1          6m30s
traefik-zx56m   1/1     Running   0          76s

Note the number of restarts - each restart exhibited the same problem with the same logs.

I tried probably over 10 times with the current latest version of the chart and no pod ever made it to a running state, so it seems like what was previously an intermittent issue has somehow stopped being so intermittent in later chart versions.

kieranajp avatar May 21 '21 11:05 kieranajp

I'm seeing a very similar behavior:

2022-02-06 22:32:52.126 msg="Configuration loaded from flags."
2022-02-06 22:32:52.126 msg="Traefik version 2.6.0 built on 2022-01-24T17:08:39Z"
2022-02-06 22:32:52.127 msg="Stats collection is enabled."
2022-02-06 22:32:52.127 msg="Many thanks for contributing to Traefik's improvement by allowing us to receive anonymous information from your configuration."
2022-02-06 22:32:52.127 msg="Help us improve Traefik by leaving this feature on :)"
2022-02-06 22:32:52.127 msg="More details on: https://doc.traefik.io/traefik/contributing/data-collection/"
2022-02-06 22:32:52.129 msg="Starting provider aggregator.ProviderAggregator"
2022-02-06 22:32:52.129 msg="Starting provider *traefik.Provider"
2022-02-06 22:32:52.129 msg="Starting provider *ingress.Provider"
2022-02-06 22:32:52.129 msg="Starting provider *acme.ChallengeTLSALPN"
2022-02-06 22:32:52.129 msg="ingress label selector is: \"\"" providerName=kubernetes
2022-02-06 22:32:52.129 msg="Creating in-cluster Provider client" providerName=kubernetes
2022-02-06 22:32:52.129 msg="Starting provider *crd.Provider"
2022-02-06 22:32:52.129 msg="label selector is: \"\"" providerName=kubernetescrd
2022-02-06 22:32:52.129 msg="Creating in-cluster Provider client" providerName=kubernetescrd
2022-02-06 22:32:52.129 msg="Starting provider *acme.Provider"
2022-02-06 22:32:52.129 msg="Testing certificate renew..." providerName=letsencrypt.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory"
2022-02-06 22:32:52.129 msg="Starting provider *acme.Provider"
2022-02-06 22:32:52.129 msg="Testing certificate renew..." providerName=zerossl.acme ACME CA="https://acme.zerossl.com/v2/DV90"
2022-02-06 22:32:52.131 msg="Starting provider *acme.Provider"
2022-02-06 22:32:52.131 msg="Testing certificate renew..." ACME CA="https://acme-staging-v02.api.letsencrypt.org/directory" providerName=letsencryptstaging.acme
2022-02-06 22:32:52.809 HST10.20.1.1 - - [07/Feb/2022:08:32:52 +0000] "GET /ping HTTP/1.1" 404 19 "-" "-" 1 "dashboard@internal" "-" 0ms
2022-02-06 22:33:11.804 HST10.20.1.1 - - [07/Feb/2022:08:33:11 +0000] "GET /ping HTTP/1.1" 404 19 "-" "-" 2 "dashboard@internal" "-" 0ms
2022-02-06 22:33:11.804 HST10.20.1.1 - - [07/Feb/2022:08:33:11 +0000] "GET /ping HTTP/1.1" 404 19 "-" "-" 3 "dashboard@internal" "-" 0ms
2022-02-06 22:33:21.804 HST10.20.1.1 - - [07/Feb/2022:08:33:21 +0000] "GET /ping HTTP/1.1" 404 19 "-" "-" 4 "dashboard@internal" "-" 0ms
2022-02-06 22:33:21.804 HST10.20.1.1 - - [07/Feb/2022:08:33:21 +0000] "GET /ping HTTP/1.1" 404 19 "-" "-" 5 "dashboard@internal" "-" 0ms
2022-02-06 22:33:31.804 HST10.20.1.1 - - [07/Feb/2022:08:33:31 +0000] "GET /ping HTTP/1.1" 404 19 "-" "-" 6 "dashboard@internal" "-" 0ms
2022-02-06 22:33:31.804 HST10.20.1.1 - - [07/Feb/2022:08:33:31 +0000] "GET /ping HTTP/1.1" 404 19 "-" "-" 7 "dashboard@internal" "-" 0ms
2022-02-06 22:33:31.826 msg="I have to go..."
2022-02-06 22:33:31.826 msg="Stopping server gracefully"
2022-02-06 22:33:31.826 HSTtime="2022-02-07T08:33:31Z" level=error msg="accept tcp [::]:9000: use of closed network connection" entryPointName=traefik
2022-02-06 22:33:31.826 HSTtime="2022-02-07T08:33:31Z" level=error msg="accept tcp [::]:9100: use of closed network connection" entryPointName=metrics
2022-02-06 22:33:31.826 HSTtime="2022-02-07T08:33:31Z" level=error msg="accept tcp [::]:8443: use of closed network connection" entryPointName=websecure
2022-02-06 22:33:31.826 HSTtime="2022-02-07T08:33:31Z" level=error msg="accept tcp [::]:8000: use of closed network connection" entryPointName=web
2022-02-06 22:33:31.827 HSTtime="2022-02-07T08:33:31Z" level=error msg="close tcp [::]:9100: use of closed network connection" entryPointName=metrics
2022-02-06 22:33:31.827 HSTtime="2022-02-07T08:33:31Z" level=error msg="close tcp [::]:9000: use of closed network connection" entryPointName=traefik
2022-02-06 22:33:31.827 HSTtime="2022-02-07T08:33:31Z" level=error msg="close tcp [::]:8000: use of closed network connection" entryPointName=web
2022-02-06 22:33:31.827 HSTtime="2022-02-07T08:33:31Z" level=error msg="close tcp [::]:8443: use of closed network connection" entryPointName=websecure
2022-02-06 22:33:31.827 msg="Server stopped"
2022-02-06 22:33:31.827 msg="Shutting down"

Any suggestions appreciated!

snickell avatar Feb 07 '22 08:02 snickell

@kieranajp are your pods by any chance failing their HTTP /ping liveness checks? Mine are!

snickell avatar Feb 07 '22 10:02 snickell

Hello @kieranajp ,

Thanks for your report. I don't see what we can do on this helm chart to help you, so I close it. Feel free to re-open it if you have a clear view on what this chart can do to help on this issue.

mloiseleur avatar Oct 14 '22 08:10 mloiseleur