kubespray icon indicating copy to clipboard operation
kubespray copied to clipboard

kube-apiserver stops working due to liveness probe fails, when kube_api_anonymous_auth is false

Open cagriersen opened this issue 5 years ago • 18 comments

Environment:

  • Cloud provider or hardware configuration: On-premise VMs

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"): CentOS Linux release 7.8.2003 (Core)

  • Version of Ansible (ansible --version): 2.9.6

  • Version of Python (python --version): 2.7.5

Kubespray version (commit) (git rev-parse --short HEAD): v2.13.1

Network plugin used: weave and calico (I've tested it with both)

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"): https://gist.github.com/cagriersen/f3182477249deee0bf258c5766a6c154

Command used to invoke ansible: ansible-playbook -u $(whoami) -b -i inventory/$CLUSTER_NAME/inventory.cfg cluster.yml

Output of ansible run: Finished without errors.

Anything else do we need to know: If i disable anonymous auth via kube_api_anonymous_auth: false parameter, kube-apiserver stops working after ~5 minutes later. When I check the kube-apiserver status through kubelet logs, I see that kube-apiserver returns HTTP statuscode 401 for /healthz endpoint.

Jun  7 23:08:31 node01 kubelet: I0607 23:08:31.135527    2388 prober.go:116] Liveness probe for "kube-apiserver-node01_kube-system(fc13df7a2825f6dbcfc3b7e530e124b8):kube-apiserver" failed (failure): HTTP probe failed with statuscode: 401



Jun  7 23:08:31 node01 kubelet: I0607 23:08:31.136464    2388 kuberuntime_manager.go:630] Container "kube-apiserver" ({"docker" "423fcf47d7ee2a0ec12ddabc5aeae1b2d0974c5b78b67dede5fa90ab0e256cac"}) of pod kube-apiserver-node01_kube-system(fc13df7a2825f6dbcfc3b7e530e124b8): Container kube-apiserver failed liveness probe, will be restarted

So it stops working since liveness prob fails constantly. However, I can interact to k8s via kubectl until liveness prod giving up.

Also, while liveness probe failing, I can curl to /healthz (or any other endpoints) without any issue.

APISERVER=$(kubectl config view | grep server | cut -f 2- -d ":" | tr -d " ")
TOKEN=$(kubectl describe secret $(kubectl get secrets | grep default | cut -f1 -d ' ') | grep -E '^token' | cut -f2 -d':' | tr -d '\t')
curl -v $APISERVER/healthz \
  --header "Authorization: Bearer $TOKEN" \
  --insecure \
  --cacert "/etc/kubernetes/ssl/ca.crt" \
  --cert "/etc/kubernetes/ssl/apiserver-kubelet-client.crt" \
  --key "/etc/kubernetes/ssl/apiserver-kubelet-client.key"

### OUTPUT ###
* About to connect() to x.x.x.x port 6443 (#0)
*   Trying x.x.x.x...
* Connected to x.x.x.x (x.x.x.x) port 6443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate from file
* 	subject: CN=kube-apiserver-kubelet-client,O=system:masters
* 	start date: Jun 07 16:02:04 2020 GMT
* 	expire date: Jun 07 16:02:05 2021 GMT
* 	common name: kube-apiserver-kubelet-client
* 	issuer: CN=kubernetes
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* 	subject: CN=kube-apiserver
* 	start date: Jun 07 16:02:04 2020 GMT
* 	expire date: Jun 07 16:02:04 2021 GMT
* 	common name: kube-apiserver
* 	issuer: CN=kubernetes
> GET /healthz HTTP/1.1
> User-Agent: curl/7.29.0
> Host: x.x.x.x:6443
> Accept: */*
> Authorization: Bearer
>
< HTTP/1.1 200 OK
< Cache-Control: no-cache, private
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sun, 07 Jun 2020 20:17:56 GMT
< Content-Length: 2
<
* Connection #0 to host x.x.x.x left intac

Insecure port (8080) also works:

curl -vv 127.0.0.1:8080/healthz --header "Authorization: Bearer $TOKEN"
* About to connect() to 127.0.0.1 port 8080 (#0)
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /healthz HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 127.0.0.1:8080
> Accept: */*
> Authorization: Bearer
>
< HTTP/1.1 200 OK
< Cache-Control: no-cache, private
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sun, 07 Jun 2020 20:03:41 GMT
< Content-Length: 2
<
* Connection #0 to host 127.0.0.1 left intact

If I enable anonymous auth from /etc/kubernetes/manifests/kube-apiserver.yaml and restart kubelet, everything works as expected.

As a security best practice, I always intends to disable anonymous auth, however as its documentation says on https://kubernetes.io/docs/reference/access-authn-authz/authentication/#anonymous-requests, "In 1.6+, anonymous access is enabled by default if an authorization mode other than AlwaysAllow is used" (which kubespray configures it as --authorization-mode=Node,RBAC) kubespray deployed clusters will become dysfunctional with kube_api_anonymous_auth: false.

I don't know if there is anything that I couldn't get the point about disabling anonymous auth?

cagriersen avatar Jun 07 '20 21:06 cagriersen

As far as I know, anonymous access isn't possible because of this value : kube_apiserver_insecure_port: 0 # (disabled) Will try to see what is the problem with kube_api_anonymous_auth though, so let's keep that open and I'll see if I can reproduce.

floryut avatar Jul 07 '20 14:07 floryut

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Oct 05 '20 14:10 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot avatar Nov 04 '20 15:11 fejta-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

fejta-bot avatar Dec 04 '20 16:12 fejta-bot

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Dec 04 '20 16:12 k8s-ci-robot

Hi!

my problem is also reproducible. Reproduction steps:

  1. Create cluster with option: kube_api_anonymous_auth: true

  2. Then change the setting to kube_api_anonymous_auth: false

  3. and try updating the master: ansible-playbook -i inventory / rnd-k8s / hosts.yml upgrade-cluster.yml --tags master

The playbook falls on: included: /roles/kubernetes/master/tasks/kubeadm-upgrade.yml for k8s-m1 Monday 21 December 2020 23:29:08 +0300 (0:00:00.052) 0:01:45.554 ******* FAILED - RETRYING: kubeadm | Check api is up (60 retries left). FAILED - RETRYING: kubeadm | Check api is up (59 retries left).

In the kubelet log:

: kube-apiserver "failed (failure): HTTP probe failed with statuscode: 401

I want an exact answer to this question in the documentation. Those. you can disable anonymous access, but this will break access to the cluster. A rather dangerous and not described in the documentation possibility

homiakos avatar Dec 21 '20 20:12 homiakos

I have the same problem in Kubernetes v1.20.4 on ubuntu v20.04.2. I edited /etc/kubernetes/manifests/kube-apiserver.yaml and changed --anonymous-auth to true. Then kube-apiserver became healthy.

- command:
    - kube-apiserver
    - --advertise-address=192.168.100.10
    - --allow-privileged=true
    - --anonymous-auth=true

ma-sattari avatar Mar 06 '21 19:03 ma-sattari

/reopen @floryut /remove-lifecycle rotten

I can confirm that behavior. For me it seems to be stable again when just removing kube-api-anonymous-auth: false (no setting of the value).

kaktus42 avatar Mar 12 '21 15:03 kaktus42

@kaktus42: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

I can confirm that behavior. For me it seems to be stable again when just removing kube-api-anonymous-auth: false (no setting of the value).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 12 '21 15:03 k8s-ci-robot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

fejta-bot avatar Apr 11 '21 16:04 fejta-bot

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 11 '21 16:04 k8s-ci-robot

facing same issue here, when setting --anonymous-auth=false kube-apiserver will be READY 0/1 until the flag is set to true. is there anyway to add 401 as an expected status code as workaround!

mshiekh avatar Mar 22 '22 17:03 mshiekh

Same for me. Can't upgrade cluster without anonymous auth = true.

strojkee332 avatar Aug 22 '23 14:08 strojkee332

@nico0olas: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 29 '24 23:01 k8s-ci-robot

Same issue with anonymous auth

ksyblast avatar Aug 14 '24 12:08 ksyblast

Same

gdagil avatar Sep 12 '24 10:09 gdagil

This issue is caused by the startupProbe, readinessProbe, and livenessProbe failing to authenticate with the API server when anonymous-auth is disabled.

It can be resolved by creating a readiness-probe token and adding it to all the probes.

  1. Create the role, role binding, service account, and secret
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: readiness-probe-role
rules:
  - apiGroups: [""]
    resources: ["pods", "nodes"]
    verbs: ["get"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: readiness-probe-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: readiness-probe-role
subjects:
  - kind: ServiceAccount
    name: readiness-probe
    namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: readiness-probe
  namespace: kube-system

---

apiVersion: v1
kind: Secret
metadata:
  name: readiness-probe
  namespace: kube-system
  annotations:
    kubernetes.io/service-account.name: readiness-probe
type: kubernetes.io/service-account-token
  1. Retrieve the token
kubectl get secret readiness-probe -n kube-system -o jsonpath='{.data.token}' | base64 --decode
  1. On the control-plane node, update the /etc/kubernetes/manifests/kube-apiserver.yaml file by adding the httpHeaders. It should look like this:
      httpGet:
        host: XXXX
        path: /livez
        port: 6443
        scheme: HTTPS
        httpHeaders:
          - name: Authorization
            value: Bearer <TOKEN>
  1. Restart the kubelet
systemctl restart kubelet

xu001186 avatar Sep 30 '24 05:09 xu001186

I will raise a PR to fix it

xu001186 avatar Sep 30 '24 05:09 xu001186

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Oct 30 '24 06:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Oct 30 '24 06:10 k8s-ci-robot

Could someone please stop the bot from closing the issue?

guoard avatar Feb 17 '25 06:02 guoard

/reopen

guoard avatar Sep 10 '25 11:09 guoard

@guoard: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Sep 10 '25 11:09 k8s-ci-robot

Could someone please stop the bot from closing the issue?

@yankay @VannTen @tico88612 PTAL.

guoard avatar Sep 26 '25 12:09 guoard

AFAIK everyone can use the lifecycle command for prow : /remove-lifecycle stale /lifecycle frozen

VannTen avatar Oct 06 '25 08:10 VannTen

AFAIK everyone can use the lifecycle command for prow :

Yes, but You can't reopen an issue/PR unless you authored it or you are a collaborator.

guoard avatar Oct 06 '25 12:10 guoard

Ah yes, missed that.

VannTen avatar Oct 06 '25 12:10 VannTen

Related issues in kubernetes: https://github.com/kubernetes/kubernetes/issues/43784 https://github.com/kubernetes/kubernetes/issues/100581

guoard avatar Oct 08 '25 07:10 guoard