k3s
k3s copied to clipboard
various issues after update ( helm-install-traefik, dns errors)
Environmental Info: K3s Version: k3s version v1.24.4+k3s1 (c3f830e9) go version go1.18.1
Node(s) CPU architecture, OS, and Version: Linux kubernetes 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration: Only one Server
Describe the bug: I upgraded the cluster like described here: https://rancher.com/docs/k3s/latest/en/upgrades/basic/ Then stuff runs fine (i do not know, if the traefik job works but DNS works) After restarting the host, i got some issues on the DSN. When investigating further i also figured out, that the helm-install-traefik job stuck. I put this together in one report, cause i think that could be connected somehow. If not, i will split it up later on.
Steps To Reproduce: The Installation of k3s was pretty straightforward only the simple curl -sfL https://get.k3s.io | sh - on one machine.
Additional context / logs: CORE-DNS Logs:
[ERROR] plugin/errors: 2 auth.mydomain.org. A: read udp 10.42.0.174:44628->192.168.XXX.YYY:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. AAAA: read udp 10.42.0.174:33794->192.168.XXX.ZZZ:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. A: read udp 10.42.0.174:48597->192.168.XXX.ZZZ:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. AAAA: read udp 10.42.0.174:36233->192.168.XXX.ZZZ:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. A: read udp 10.42.0.174:47208->192.168.XXX.ZZZ:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. AAAA: read udp 10.42.0.174:55953->192.168.XXX.ZZZ:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. A: read udp 10.42.0.174:58944->192.168.XXX.YYY:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. AAAA: read udp 10.42.0.174:57563->192.168.XXX.YYY:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. A: read udp 10.42.0.174:45613->192.168.XXX.YYY:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. AAAA: read udp 10.42.0.174:46321->192.168.XXX.ZZZ:53: i/o timeout
While 192.168.XXX.YYY and 192.168.XXX.ZZZ are the two internal domain controllers which are doing DNS. The domain is also reachable from the NODE itself. so its not an connection issue.
and here are the helm-install-traefik LOGS:
if [[ ${KUBERNETES_SERVICE_HOST} =~ .*:.* ]]; then
echo "KUBERNETES_SERVICE_HOST is using IPv6"
CHART="${CHART//%\{KUBERNETES_API\}%/[${KUBERNETES_SERVICE_HOST}]:${KUBERNETES_SERVICE_PORT}}"
else
CHART="${CHART//%\{KUBERNETES_API\}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}}"
fi
set +v -x
+ [[ '' != \t\r\u\e ]]
+ export HELM_HOST=127.0.0.1:44134
+ HELM_HOST=127.0.0.1:44134
+ helm_v2 init --skip-refresh --client-only --stable-repo-url https://charts.helm.sh/stable/
+ tiller --listen=127.0.0.1:44134 --storage=secret
Creating /home/klipper-helm/.helm
Creating /home/klipper-helm/.helm/repository
Creating /home/klipper-helm/.helm/repository/cache
Creating /home/klipper-helm/.helm/repository/local
Creating /home/klipper-helm/.helm/plugins
Creating /home/klipper-helm/.helm/starters
Creating /home/klipper-helm/.helm/cache/archive
Creating /home/klipper-helm/.helm/repository/repositories.yaml
Adding stable repo with URL: https://charts.helm.sh/stable/
Adding local repo with URL: http://127.0.0.1:8879/charts
[main] 2022/08/28 09:16:53 Starting Tiller v2.17.0 (tls=false)
[main] 2022/08/28 09:16:53 GRPC listening on 127.0.0.1:44134
[main] 2022/08/28 09:16:53 Probes listening on :44135
[main] 2022/08/28 09:16:53 Storage driver is Secret
[main] 2022/08/28 09:16:53 Max history per release is 0
$HELM_HOME has been configured at /home/klipper-helm/.helm.
Not installing Tiller due to 'client-only' flag having been set
++ jq -r '.Releases | length'
++ timeout -s KILL 30 helm_v2 ls --all '^traefik$' --output json
[storage] 2022/08/28 09:16:53 listing all releases with filter
+ V2_CHART_EXISTS=1
+ [[ 1 == \1 ]]
+ [[ '' == \t\r\u\e ]]
+ HELM=helm_v2
+ NAME_ARG=--name
+ JQ_CMD='"\(.Releases[0].AppVersion),\(.Releases[0].Status)"'
+ [[ -f /config/ca-file.pem ]]
+ [[ -n '' ]]
+ shopt -s nullglob
+ helm_content_decode
+ set -e
+ ENC_CHART_PATH=/chart/traefik.tgz.base64
+ CHART_PATH=/tmp/traefik.tgz
+ [[ ! -f /chart/traefik.tgz.base64 ]]
+ return
+ [[ install != \d\e\l\e\t\e ]]
+ helm_repo_init
+ grep -q -e 'https\?://'
+ echo 'chart path is a url, skipping repo update'
chart path is a url, skipping repo update
+ helm_v2 repo remove stable
"stable" has been removed from your repositories
+ return
+ helm_update install
+ [[ helm_v2 == \h\e\l\m\_\v\3 ]]
++ tr '[:upper:]' '[:lower:]'
++ jq -r '"\(.Releases[0].AppVersion),\(.Releases[0].Status)"'
++ helm_v2 ls --all '^traefik$' --output json
[storage] 2022/08/28 09:16:53 listing all releases with filter
+ LINE=1.7.19,deployed
+ IFS=,
+ read -r INSTALLED_VERSION STATUS _
+ VALUES=
+ for VALUES_FILE in /config/*.yaml
+ VALUES=' --values /config/values-01_HelmChart.yaml'
+ [[ install = \d\e\l\e\t\e ]]
+ [[ 1.7.19 =~ ^(|null)$ ]]
+ [[ deployed =~ ^(pending-install|pending-upgrade|pending-rollback)$ ]]
+ [[ deployed == \d\e\p\l\o\y\e\d ]]
+ echo 'Already installed traefik'
+ [[ helm_v2 == \h\e\l\m\_\v\3 ]]
Already installed traefik
+ helm_v2 mapkubeapis traefik --v2
Error: unknown command "mapkubeapis" for "helm"
Run 'helm --help' for usage.
Which installed k3s version did you try to upgrade?
i don't know, i think the last update was in april this year ..
It's hard to say exactly what's going on, but based on the fact that it appears to be upgrading from a very old Traefik release, it seems like you were quite a few versions back. You might try running the k3s-killall.sh script to terminate the running pods, then reboot the node. See if coredns and other pods come up into a better state after that.
Did not help, neighter did another update to the current version. Its still failing.
You might try starting k3s with --disable=traefik to remove the legacy traefik v1 helm chart. Once you see that the uninstallation job has completed, you can remove the flag and restart, and it should install the newer chart.
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.