k3s icon indicating copy to clipboard operation
k3s copied to clipboard

various issues after update ( helm-install-traefik, dns errors)

Open sambalmueslie opened this issue 3 years ago • 5 comments

Environmental Info: K3s Version: k3s version v1.24.4+k3s1 (c3f830e9) go version go1.18.1

Node(s) CPU architecture, OS, and Version: Linux kubernetes 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: Only one Server

Describe the bug: I upgraded the cluster like described here: https://rancher.com/docs/k3s/latest/en/upgrades/basic/ Then stuff runs fine (i do not know, if the traefik job works but DNS works) After restarting the host, i got some issues on the DSN. When investigating further i also figured out, that the helm-install-traefik job stuck. I put this together in one report, cause i think that could be connected somehow. If not, i will split it up later on.

Steps To Reproduce: The Installation of k3s was pretty straightforward only the simple curl -sfL https://get.k3s.io | sh - on one machine.

Additional context / logs: CORE-DNS Logs:

[ERROR] plugin/errors: 2 auth.mydomain.org. A: read udp 10.42.0.174:44628->192.168.XXX.YYY:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. AAAA: read udp 10.42.0.174:33794->192.168.XXX.ZZZ:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. A: read udp 10.42.0.174:48597->192.168.XXX.ZZZ:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. AAAA: read udp 10.42.0.174:36233->192.168.XXX.ZZZ:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. A: read udp 10.42.0.174:47208->192.168.XXX.ZZZ:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. AAAA: read udp 10.42.0.174:55953->192.168.XXX.ZZZ:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. A: read udp 10.42.0.174:58944->192.168.XXX.YYY:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. AAAA: read udp 10.42.0.174:57563->192.168.XXX.YYY:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. A: read udp 10.42.0.174:45613->192.168.XXX.YYY:53: i/o timeout
[ERROR] plugin/errors: 2 auth.mydomain.org. AAAA: read udp 10.42.0.174:46321->192.168.XXX.ZZZ:53: i/o timeout

While 192.168.XXX.YYY and 192.168.XXX.ZZZ are the two internal domain controllers which are doing DNS. The domain is also reachable from the NODE itself. so its not an connection issue.

and here are the helm-install-traefik LOGS:

if [[ ${KUBERNETES_SERVICE_HOST} =~ .*:.* ]]; then
 echo "KUBERNETES_SERVICE_HOST is using IPv6"
 CHART="${CHART//%\{KUBERNETES_API\}%/[${KUBERNETES_SERVICE_HOST}]:${KUBERNETES_SERVICE_PORT}}"
else
 CHART="${CHART//%\{KUBERNETES_API\}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}}"
fi
set +v -x
+ [[ '' != \t\r\u\e ]]
+ export HELM_HOST=127.0.0.1:44134
+ HELM_HOST=127.0.0.1:44134
+ helm_v2 init --skip-refresh --client-only --stable-repo-url https://charts.helm.sh/stable/
+ tiller --listen=127.0.0.1:44134 --storage=secret
Creating /home/klipper-helm/.helm 
Creating /home/klipper-helm/.helm/repository 
Creating /home/klipper-helm/.helm/repository/cache 
Creating /home/klipper-helm/.helm/repository/local 
Creating /home/klipper-helm/.helm/plugins 
Creating /home/klipper-helm/.helm/starters 
Creating /home/klipper-helm/.helm/cache/archive 
Creating /home/klipper-helm/.helm/repository/repositories.yaml 
Adding stable repo with URL: https://charts.helm.sh/stable/ 
Adding local repo with URL: http://127.0.0.1:8879/charts 
[main] 2022/08/28 09:16:53 Starting Tiller v2.17.0 (tls=false)
[main] 2022/08/28 09:16:53 GRPC listening on 127.0.0.1:44134
[main] 2022/08/28 09:16:53 Probes listening on :44135
[main] 2022/08/28 09:16:53 Storage driver is Secret
[main] 2022/08/28 09:16:53 Max history per release is 0
$HELM_HOME has been configured at /home/klipper-helm/.helm.
Not installing Tiller due to 'client-only' flag having been set
++ jq -r '.Releases | length'
++ timeout -s KILL 30 helm_v2 ls --all '^traefik$' --output json
[storage] 2022/08/28 09:16:53 listing all releases with filter
+ V2_CHART_EXISTS=1
+ [[ 1 == \1 ]]
+ [[ '' == \t\r\u\e ]]
+ HELM=helm_v2
+ NAME_ARG=--name
+ JQ_CMD='"\(.Releases[0].AppVersion),\(.Releases[0].Status)"'
+ [[ -f /config/ca-file.pem ]]
+ [[ -n '' ]]
+ shopt -s nullglob
+ helm_content_decode
+ set -e
+ ENC_CHART_PATH=/chart/traefik.tgz.base64
+ CHART_PATH=/tmp/traefik.tgz
+ [[ ! -f /chart/traefik.tgz.base64 ]]
+ return
+ [[ install != \d\e\l\e\t\e ]]
+ helm_repo_init
+ grep -q -e 'https\?://'
+ echo 'chart path is a url, skipping repo update'
chart path is a url, skipping repo update
+ helm_v2 repo remove stable
"stable" has been removed from your repositories
+ return
+ helm_update install
+ [[ helm_v2 == \h\e\l\m\_\v\3 ]]
++ tr '[:upper:]' '[:lower:]'
++ jq -r '"\(.Releases[0].AppVersion),\(.Releases[0].Status)"'
++ helm_v2 ls --all '^traefik$' --output json
[storage] 2022/08/28 09:16:53 listing all releases with filter
+ LINE=1.7.19,deployed
+ IFS=,
+ read -r INSTALLED_VERSION STATUS _
+ VALUES=
+ for VALUES_FILE in /config/*.yaml
+ VALUES=' --values /config/values-01_HelmChart.yaml'
+ [[ install = \d\e\l\e\t\e ]]
+ [[ 1.7.19 =~ ^(|null)$ ]]
+ [[ deployed =~ ^(pending-install|pending-upgrade|pending-rollback)$ ]]
+ [[ deployed == \d\e\p\l\o\y\e\d ]]
+ echo 'Already installed traefik'
+ [[ helm_v2 == \h\e\l\m\_\v\3 ]]
Already installed traefik
+ helm_v2 mapkubeapis traefik --v2
Error: unknown command "mapkubeapis" for "helm"
Run 'helm --help' for usage.

sambalmueslie avatar Aug 28 '22 09:08 sambalmueslie

Which installed k3s version did you try to upgrade?

knweiss avatar Aug 29 '22 11:08 knweiss

i don't know, i think the last update was in april this year ..

sambalmueslie avatar Aug 29 '22 12:08 sambalmueslie

It's hard to say exactly what's going on, but based on the fact that it appears to be upgrading from a very old Traefik release, it seems like you were quite a few versions back. You might try running the k3s-killall.sh script to terminate the running pods, then reboot the node. See if coredns and other pods come up into a better state after that.

brandond avatar Aug 29 '22 23:08 brandond

Did not help, neighter did another update to the current version. Its still failing.

sambalmueslie avatar Sep 10 '22 17:09 sambalmueslie

You might try starting k3s with --disable=traefik to remove the legacy traefik v1 helm chart. Once you see that the uninstallation job has completed, you can remove the flag and restart, and it should install the newer chart.

brandond avatar Sep 10 '22 18:09 brandond

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

stale[bot] avatar Mar 10 '23 00:03 stale[bot]