kubespray kube-apiserver stops working due to liveness probe fails, when kube_api_anonymous

Environment:

Cloud provider or hardware configuration: On-premise VMs
OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"): CentOS Linux release 7.8.2003 (Core)
Version of Ansible (ansible --version): 2.9.6
Version of Python (python --version): 2.7.5

Kubespray version (commit) (git rev-parse --short HEAD): v2.13.1

Network plugin used: weave and calico (I've tested it with both)

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"): https://gist.github.com/cagriersen/f3182477249deee0bf258c5766a6c154

Command used to invoke ansible: ansible-playbook -u $(whoami) -b -i inventory/$CLUSTER_NAME/inventory.cfg cluster.yml

Output of ansible run: Finished without errors.

Anything else do we need to know: If i disable anonymous auth via kube_api_anonymous_auth: false parameter, kube-apiserver stops working after ~5 minutes later. When I check the kube-apiserver status through kubelet logs, I see that kube-apiserver returns HTTP statuscode 401 for /healthz endpoint.

Jun  7 23:08:31 node01 kubelet: I0607 23:08:31.135527    2388 prober.go:116] Liveness probe for "kube-apiserver-node01_kube-system(fc13df7a2825f6dbcfc3b7e530e124b8):kube-apiserver" failed (failure): HTTP probe failed with statuscode: 401



Jun  7 23:08:31 node01 kubelet: I0607 23:08:31.136464    2388 kuberuntime_manager.go:630] Container "kube-apiserver" ({"docker" "423fcf47d7ee2a0ec12ddabc5aeae1b2d0974c5b78b67dede5fa90ab0e256cac"}) of pod kube-apiserver-node01_kube-system(fc13df7a2825f6dbcfc3b7e530e124b8): Container kube-apiserver failed liveness probe, will be restarted

So it stops working since liveness prob fails constantly. However, I can interact to k8s via kubectl until liveness prod giving up.

Also, while liveness probe failing, I can curl to /healthz (or any other endpoints) without any issue.

APISERVER=$(kubectl config view | grep server | cut -f 2- -d ":" | tr -d " ")
TOKEN=$(kubectl describe secret $(kubectl get secrets | grep default | cut -f1 -d ' ') | grep -E '^token' | cut -f2 -d':' | tr -d '\t')
curl -v $APISERVER/healthz \
  --header "Authorization: Bearer $TOKEN" \
  --insecure \
  --cacert "/etc/kubernetes/ssl/ca.crt" \
  --cert "/etc/kubernetes/ssl/apiserver-kubelet-client.crt" \
  --key "/etc/kubernetes/ssl/apiserver-kubelet-client.key"

### OUTPUT ###
* About to connect() to x.x.x.x port 6443 (#0)
*   Trying x.x.x.x...
* Connected to x.x.x.x (x.x.x.x) port 6443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate from file
* 	subject: CN=kube-apiserver-kubelet-client,O=system:masters
* 	start date: Jun 07 16:02:04 2020 GMT
* 	expire date: Jun 07 16:02:05 2021 GMT
* 	common name: kube-apiserver-kubelet-client
* 	issuer: CN=kubernetes
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* 	subject: CN=kube-apiserver
* 	start date: Jun 07 16:02:04 2020 GMT
* 	expire date: Jun 07 16:02:04 2021 GMT
* 	common name: kube-apiserver
* 	issuer: CN=kubernetes
> GET /healthz HTTP/1.1
> User-Agent: curl/7.29.0
> Host: x.x.x.x:6443
> Accept: */*
> Authorization: Bearer
>
< HTTP/1.1 200 OK
< Cache-Control: no-cache, private
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sun, 07 Jun 2020 20:17:56 GMT
< Content-Length: 2
<
* Connection #0 to host x.x.x.x left intac

Insecure port (8080) also works:

curl -vv 127.0.0.1:8080/healthz --header "Authorization: Bearer $TOKEN"
* About to connect() to 127.0.0.1 port 8080 (#0)
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /healthz HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 127.0.0.1:8080
> Accept: */*
> Authorization: Bearer
>
< HTTP/1.1 200 OK
< Cache-Control: no-cache, private
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sun, 07 Jun 2020 20:03:41 GMT
< Content-Length: 2
<
* Connection #0 to host 127.0.0.1 left intact

If I enable anonymous auth from /etc/kubernetes/manifests/kube-apiserver.yaml and restart kubelet, everything works as expected.

As a security best practice, I always intends to disable anonymous auth, however as its documentation says on https://kubernetes.io/docs/reference/access-authn-authz/authentication/#anonymous-requests, "In 1.6+, anonymous access is enabled by default if an authorization mode other than AlwaysAllow is used" (which kubespray configures it as --authorization-mode=Node,RBAC) kubespray deployed clusters will become dysfunctional with kube_api_anonymous_auth: false.

I don't know if there is anything that I couldn't get the point about disabling anonymous auth?

Jun 07 '20 21:06 cagriersen

As far as I know, anonymous access isn't possible because of this value : kube_apiserver_insecure_port: 0 # (disabled) Will try to see what is the problem with kube_api_anonymous_auth though, so let's keep that open and I'll see if I can reproduce.

Jul 07 '20 14:07 floryut

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Oct 05 '20 14:10 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

Nov 04 '20 15:11 fejta-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Dec 04 '20 16:12 fejta-bot

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Dec 04 '20 16:12 k8s-ci-robot

Hi!

my problem is also reproducible. Reproduction steps:

Create cluster with option: kube_api_anonymous_auth: true
Then change the setting to kube_api_anonymous_auth: false
and try updating the master: ansible-playbook -i inventory / rnd-k8s / hosts.yml upgrade-cluster.yml --tags master

The playbook falls on: included: /roles/kubernetes/master/tasks/kubeadm-upgrade.yml for k8s-m1 Monday 21 December 2020 23:29:08 +0300 (0:00:00.052) 0:01:45.554 ******* FAILED - RETRYING: kubeadm | Check api is up (60 retries left). FAILED - RETRYING: kubeadm | Check api is up (59 retries left).

In the kubelet log:

: kube-apiserver "failed (failure): HTTP probe failed with statuscode: 401

I want an exact answer to this question in the documentation. Those. you can disable anonymous access, but this will break access to the cluster. A rather dangerous and not described in the documentation possibility

Dec 21 '20 20:12 homiakos

I have the same problem in Kubernetes v1.20.4 on ubuntu v20.04.2. I edited /etc/kubernetes/manifests/kube-apiserver.yaml and changed --anonymous-auth to true. Then kube-apiserver became healthy.

- command:
    - kube-apiserver
    - --advertise-address=192.168.100.10
    - --allow-privileged=true
    - --anonymous-auth=true

Mar 06 '21 19:03 ma-sattari

/reopen @floryut /remove-lifecycle rotten

I can confirm that behavior. For me it seems to be stable again when just removing kube-api-anonymous-auth: false (no setting of the value).

Mar 12 '21 15:03 kaktus42

@kaktus42: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

I can confirm that behavior. For me it seems to be stable again when just removing kube-api-anonymous-auth: false (no setting of the value).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 12 '21 15:03 k8s-ci-robot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

Apr 11 '21 16:04 fejta-bot

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Apr 11 '21 16:04 k8s-ci-robot

facing same issue here, when setting --anonymous-auth=false kube-apiserver will be READY 0/1 until the flag is set to true. is there anyway to add 401 as an expected status code as workaround!

Mar 22 '22 17:03 mshiekh

Same for me. Can't upgrade cluster without anonymous auth = true.

Aug 22 '23 14:08 strojkee332

@nico0olas: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jan 29 '24 23:01 k8s-ci-robot

Same issue with anonymous auth

Aug 14 '24 12:08 ksyblast

Same

Sep 12 '24 10:09 gdagil

This issue is caused by the startupProbe, readinessProbe, and livenessProbe failing to authenticate with the API server when anonymous-auth is disabled.

It can be resolved by creating a readiness-probe token and adding it to all the probes.

Create the role, role binding, service account, and secret

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: readiness-probe-role
rules:
  - apiGroups: [""]
    resources: ["pods", "nodes"]
    verbs: ["get"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: readiness-probe-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: readiness-probe-role
subjects:
  - kind: ServiceAccount
    name: readiness-probe
    namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: readiness-probe
  namespace: kube-system

---

apiVersion: v1
kind: Secret
metadata:
  name: readiness-probe
  namespace: kube-system
  annotations:
    kubernetes.io/service-account.name: readiness-probe
type: kubernetes.io/service-account-token

Retrieve the token

kubectl get secret readiness-probe -n kube-system -o jsonpath='{.data.token}' | base64 --decode

On the control-plane node, update the /etc/kubernetes/manifests/kube-apiserver.yaml file by adding the httpHeaders. It should look like this:

      httpGet:
        host: XXXX
        path: /livez
        port: 6443
        scheme: HTTPS
        httpHeaders:
          - name: Authorization
            value: Bearer <TOKEN>

Restart the kubelet

systemctl restart kubelet

Sep 30 '24 05:09 xu001186

I will raise a PR to fix it

Sep 30 '24 05:09 xu001186

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Oct 30 '24 06:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Oct 30 '24 06:10 k8s-ci-robot

Could someone please stop the bot from closing the issue?

Feb 17 '25 06:02 guoard

/reopen

Sep 10 '25 11:09 guoard

@guoard: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sep 10 '25 11:09 k8s-ci-robot

Could someone please stop the bot from closing the issue?

@yankay @VannTen @tico88612 PTAL.

Sep 26 '25 12:09 guoard

AFAIK everyone can use the lifecycle command for prow : /remove-lifecycle stale /lifecycle frozen

Oct 06 '25 08:10 VannTen

AFAIK everyone can use the lifecycle command for prow :

Yes, but You can't reopen an issue/PR unless you authored it or you are a collaborator.

Oct 06 '25 12:10 guoard

Ah yes, missed that.

Oct 06 '25 12:10 VannTen

Related issues in kubernetes: https://github.com/kubernetes/kubernetes/issues/43784 https://github.com/kubernetes/kubernetes/issues/100581

Oct 08 '25 07:10 guoard

kube-apiserver stops working due to liveness probe fails, when kube_api_anonymous_auth is false