tilt
tilt copied to clipboard
cluster liveness check fails
Current Behavior
ahy
in the slack channel reports that when they connect tilt to their remote cluster, it fails with:
Cluster status error: cluster did not pass liveness check
If they try to run the liveness check manually, they get
$ kubectl get --raw='/livez?verbose'
Error from server (NotFound): the server could not find the requested resource
It appears that livez was added in Kubernetes 1.16 and is not supported on their Rancher distro.
The confirm the /healthz check works though
Possible Solutions
Maybe we should only use /healthz? not sure what the additional benefit of using /livez is.
Alternatively, if we get a 404 from /livez, we could ignore it.
@milas any chance you remember what the reasoning was behind the different health checks?
alternatively, maybe we just skip the health checks on older versions of kubernetes... https://kubernetes.io/docs/reference/using-api/health-checks/
Used /livez
because of this note from the health checks doc:
The
healthz
endpoint is deprecated (since Kubernetes v1.16), and you should use the more specificlivez
andreadyz
endpoints instead.
Alternatively, if we get a 404 from /livez, we could ignore it.
This seems reasonable - could also try to fallback to a /readyz
in this case
Running into this issue as well from our Rancher environment in:
❯ tilt version
v0.31.2, built 2023-02-10
We downgraded to the following version to continue to use tilt.
v0.28.1, built 2022-05-01
Browsing through the codebase, I believe an enhancement to verify against 404
can be implemented here. Additionally, can we fallback to /healthz
as well?:
https://github.com/tilt-dev/tilt/blob/95a35874112c38057685a3342c4924c83e9d1b7b/internal/k8s/client.go#L765-L786
On the other hand, Rancher can be updated to include /livez
or readyz
because Kubernetes documentation mentioned:
Machines that check the healthz/livez/readyz of the API server should rely on the HTTP status code.here:
I believe this is where Rancher generates the listener for /healthz
.
https://github.com/rancher/rancher/blob/e2410e02494a5b4bd43c50d8d45ed7df5a3ad0a8/pkg/api/steve/health/health.go#L10-L19
@atsai1220 how do you downgrade tilt? currently facing the same issue
@atsai1220 how do you downgrade tilt? currently facing the same issue
Navigate to the Release page of this repository and download from the Assets menu of your desired version.
Copy the URL for your operating system and retrieve the package:
wget https://github.com/tilt-dev/tilt/releases/download/v0.28.1/tilt.0.28.1.linux.x86_64.tar.gz
Any plan to add "/healthz" to cluster api health checks? I'm working on k8s 1.20.15 via Rancher. and currently blocked from using latest tilt version :(
@MatanAmoyal1 hmmm... /livez
should work fine in k8s 1.20, are you sure you're not hitting some other issue / blocking it some other way?
@nicks it's looks like the same issue. (k8s 1.20 via Rancher) healthz works, but livez not.
`
➜ ~ kubectl proxy&
[1] 37020
➜ ~ Starting to serve on 127.0.0.1:8001
➜ ~ curl 127.0.0.1:8001/healthz
ok%
➜ ~ curl 127.0.0.1:8001/livez
404 page not found
`
@nicks any plan to merge this PR https://github.com/tilt-dev/tilt/pull/6065 ?
fwiw, i have been unable to reproduce this problem:
k3d cluster create -i rancher/k3s:v1.20.15-k3s1
kubectl get --raw='/livez?verbose'
seems to produce a valid healthcheck for me. is it possible that your devops team is blocking the kubernetes healthcheck routes?
Unfortunately I'm stuck using a version of Openshift 3, (k8s v1.11) and so I'm unable to use the current version of Tilt as the livez endpoint is not present. Is there any plans on fixing this issue? So far i've been using v0.28.1 and it works
We are using ranchers included k3s kubernetes, and its livez check is behind authentication: https://github.com/k3s-io/k3s/issues/3576#issuecomment-875041119
So, we are sadly also forced to fall back to an older tilt version...
@Richie24 the issue you pointed to is a 401 rather than the 404 reported in other comments, so it sounds like you're hitting a different problem. fwiw, tilt uses your kubectl credentials, so auth shouldn't affect things.