all knative pods are in 'CrashLoopBackOff' with "Failed to get k8s version" error
In what area(s)?
build test-and-release
Other classifications: This is the first issue when I try to installl knative on a fresh AKS cluster
What version of Knative?
knative version : 1.13.1
k8s cluster version:
Expected Behavior
'webhook' and 'activator' and 'autoscaler' and 'controller' pods should be in status running.
Actual Behavior
'webhook' and 'activator' and 'autoscaler' and 'controller' pods are in 'CrashLoopBackOff' and won't run.
Steps to Reproduce the Problem
kubectl apply -f crds.yaml
kubectl apply -f core.yaml
Both log see same error cause. 'Failed to get k8s version'
I use kubectl -n knative-serving get pods activator -o yaml to the details of pod.I see ca.crt already mount under pod, so why can not verify the certificate of api-server.
I use curl command to test ca.crt in a testbox container. it is working fine
Is there a workaroud to not verify the certificate of api-server , like "curl -k"
Hi @snailshadow what is the output if you run:
kubectl -it exec <activator> -- curl -v -k https://<api-server>:443/version?timeout=32s
kubectl -it exec <activator> -- curl -v --header "Authorization..." --cacert <path to sa> https://<api-server>:443/version?timeout=32s
@skonto pod is not ready , so I am unable to run these two commands to verify the certificate of k8s api-server
Activator should be no different it uses the K8s sa crt and token. Do you use a custom activator image or an upstream release? Another possible reason you see this is that certificates are not setup correctly for some reason on your cluster, what happens if you rotate them (https://learn.microsoft.com/en-us/azure/aks/certificate-rotation)? Do you see any difference when you run the nginx pod in the knative-serving ns?
pod is not ready , so I am unable to run these two commands to verify the certificate of k8s api-server
One option would be to modify the activator image to run the commands when it starts or try a PostStart hook (might run after the entrypoint).
/triage needs-user-input
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.