kube-apiserver stops working due to liveness probe fails, when kube_api_anonymous_auth is false
Environment:
-
Cloud provider or hardware configuration: On-premise VMs
-
OS (
printf "$(uname -srm)\n$(cat /etc/os-release)\n"): CentOS Linux release 7.8.2003 (Core) -
Version of Ansible (
ansible --version): 2.9.6 -
Version of Python (
python --version): 2.7.5
Kubespray version (commit) (git rev-parse --short HEAD): v2.13.1
Network plugin used: weave and calico (I've tested it with both)
Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):
https://gist.github.com/cagriersen/f3182477249deee0bf258c5766a6c154
Command used to invoke ansible:
ansible-playbook -u $(whoami) -b -i inventory/$CLUSTER_NAME/inventory.cfg cluster.yml
Output of ansible run: Finished without errors.
Anything else do we need to know:
If i disable anonymous auth via kube_api_anonymous_auth: false parameter, kube-apiserver stops working after ~5 minutes later. When I check the kube-apiserver status through kubelet logs, I see that kube-apiserver returns HTTP statuscode 401 for /healthz endpoint.
Jun 7 23:08:31 node01 kubelet: I0607 23:08:31.135527 2388 prober.go:116] Liveness probe for "kube-apiserver-node01_kube-system(fc13df7a2825f6dbcfc3b7e530e124b8):kube-apiserver" failed (failure): HTTP probe failed with statuscode: 401
Jun 7 23:08:31 node01 kubelet: I0607 23:08:31.136464 2388 kuberuntime_manager.go:630] Container "kube-apiserver" ({"docker" "423fcf47d7ee2a0ec12ddabc5aeae1b2d0974c5b78b67dede5fa90ab0e256cac"}) of pod kube-apiserver-node01_kube-system(fc13df7a2825f6dbcfc3b7e530e124b8): Container kube-apiserver failed liveness probe, will be restarted
So it stops working since liveness prob fails constantly. However, I can interact to k8s via kubectl until liveness prod giving up.
Also, while liveness probe failing, I can curl to /healthz (or any other endpoints) without any issue.
APISERVER=$(kubectl config view | grep server | cut -f 2- -d ":" | tr -d " ")
TOKEN=$(kubectl describe secret $(kubectl get secrets | grep default | cut -f1 -d ' ') | grep -E '^token' | cut -f2 -d':' | tr -d '\t')
curl -v $APISERVER/healthz \
--header "Authorization: Bearer $TOKEN" \
--insecure \
--cacert "/etc/kubernetes/ssl/ca.crt" \
--cert "/etc/kubernetes/ssl/apiserver-kubelet-client.crt" \
--key "/etc/kubernetes/ssl/apiserver-kubelet-client.key"
### OUTPUT ###
* About to connect() to x.x.x.x port 6443 (#0)
* Trying x.x.x.x...
* Connected to x.x.x.x (x.x.x.x) port 6443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate from file
* subject: CN=kube-apiserver-kubelet-client,O=system:masters
* start date: Jun 07 16:02:04 2020 GMT
* expire date: Jun 07 16:02:05 2021 GMT
* common name: kube-apiserver-kubelet-client
* issuer: CN=kubernetes
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=kube-apiserver
* start date: Jun 07 16:02:04 2020 GMT
* expire date: Jun 07 16:02:04 2021 GMT
* common name: kube-apiserver
* issuer: CN=kubernetes
> GET /healthz HTTP/1.1
> User-Agent: curl/7.29.0
> Host: x.x.x.x:6443
> Accept: */*
> Authorization: Bearer
>
< HTTP/1.1 200 OK
< Cache-Control: no-cache, private
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sun, 07 Jun 2020 20:17:56 GMT
< Content-Length: 2
<
* Connection #0 to host x.x.x.x left intac
Insecure port (8080) also works:
curl -vv 127.0.0.1:8080/healthz --header "Authorization: Bearer $TOKEN"
* About to connect() to 127.0.0.1 port 8080 (#0)
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /healthz HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 127.0.0.1:8080
> Accept: */*
> Authorization: Bearer
>
< HTTP/1.1 200 OK
< Cache-Control: no-cache, private
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sun, 07 Jun 2020 20:03:41 GMT
< Content-Length: 2
<
* Connection #0 to host 127.0.0.1 left intact
If I enable anonymous auth from /etc/kubernetes/manifests/kube-apiserver.yaml and restart kubelet, everything works as expected.
As a security best practice, I always intends to disable anonymous auth, however as its documentation says on https://kubernetes.io/docs/reference/access-authn-authz/authentication/#anonymous-requests, "In 1.6+, anonymous access is enabled by default if an authorization mode other than AlwaysAllow is used" (which kubespray configures it as --authorization-mode=Node,RBAC) kubespray deployed clusters will become dysfunctional with kube_api_anonymous_auth: false.
I don't know if there is anything that I couldn't get the point about disabling anonymous auth?
As far as I know, anonymous access isn't possible because of this value : kube_apiserver_insecure_port: 0 # (disabled)
Will try to see what is the problem with kube_api_anonymous_auth though, so let's keep that open and I'll see if I can reproduce.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen. Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Hi!
my problem is also reproducible. Reproduction steps:
-
Create cluster with option: kube_api_anonymous_auth: true
-
Then change the setting to kube_api_anonymous_auth: false
-
and try updating the master: ansible-playbook -i inventory / rnd-k8s / hosts.yml upgrade-cluster.yml --tags master
The playbook falls on: included: /roles/kubernetes/master/tasks/kubeadm-upgrade.yml for k8s-m1 Monday 21 December 2020 23:29:08 +0300 (0:00:00.052) 0:01:45.554 ******* FAILED - RETRYING: kubeadm | Check api is up (60 retries left). FAILED - RETRYING: kubeadm | Check api is up (59 retries left).
In the kubelet log:
: kube-apiserver "failed (failure): HTTP probe failed with statuscode: 401
I want an exact answer to this question in the documentation. Those. you can disable anonymous access, but this will break access to the cluster. A rather dangerous and not described in the documentation possibility
I have the same problem in Kubernetes v1.20.4 on ubuntu v20.04.2. I edited /etc/kubernetes/manifests/kube-apiserver.yaml and changed --anonymous-auth to true. Then kube-apiserver became healthy.
- command:
- kube-apiserver
- --advertise-address=192.168.100.10
- --allow-privileged=true
- --anonymous-auth=true
/reopen @floryut /remove-lifecycle rotten
I can confirm that behavior. For me it seems to be stable again when just removing kube-api-anonymous-auth: false (no setting of the value).
@kaktus42: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
I can confirm that behavior. For me it seems to be stable again when just removing
kube-api-anonymous-auth: false(no setting of the value).
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-contributor-experience at kubernetes/community. /close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen. Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-contributor-experience at kubernetes/community. /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
facing same issue here, when setting --anonymous-auth=false kube-apiserver will be READY 0/1 until the flag is set to true. is there anyway to add 401 as an expected status code as workaround!
Same for me. Can't upgrade cluster without anonymous auth = true.
@nico0olas: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Same issue with anonymous auth
Same
This issue is caused by the startupProbe, readinessProbe, and livenessProbe failing to authenticate with the API server when anonymous-auth is disabled.
It can be resolved by creating a readiness-probe token and adding it to all the probes.
- Create the role, role binding, service account, and secret
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: readiness-probe-role
rules:
- apiGroups: [""]
resources: ["pods", "nodes"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: readiness-probe-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: readiness-probe-role
subjects:
- kind: ServiceAccount
name: readiness-probe
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: readiness-probe
namespace: kube-system
---
apiVersion: v1
kind: Secret
metadata:
name: readiness-probe
namespace: kube-system
annotations:
kubernetes.io/service-account.name: readiness-probe
type: kubernetes.io/service-account-token
- Retrieve the token
kubectl get secret readiness-probe -n kube-system -o jsonpath='{.data.token}' | base64 --decode
- On the control-plane node, update the /etc/kubernetes/manifests/kube-apiserver.yaml file by adding the httpHeaders. It should look like this:
httpGet:
host: XXXX
path: /livez
port: 6443
scheme: HTTPS
httpHeaders:
- name: Authorization
value: Bearer <TOKEN>
- Restart the kubelet
systemctl restart kubelet
I will raise a PR to fix it
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Could someone please stop the bot from closing the issue?
/reopen
@guoard: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Could someone please stop the bot from closing the issue?
@yankay @VannTen @tico88612 PTAL.
AFAIK everyone can use the lifecycle command for prow : /remove-lifecycle stale /lifecycle frozen
AFAIK everyone can use the lifecycle command for prow :
Yes, but You can't reopen an issue/PR unless you authored it or you are a collaborator.
Ah yes, missed that.
Related issues in kubernetes: https://github.com/kubernetes/kubernetes/issues/43784 https://github.com/kubernetes/kubernetes/issues/100581