manifests
manifests copied to clipboard
KServe and cert-manager webhooks are failing
While isntalling Kubeflow using the command:
while ! kustomize build example | awk '!/well-defined/' | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
Some webhooks could not be reached:
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root ce rtificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
[biswa@fedora manifests]$ sudo kubectl get endpoints -n cert-manager cert-manager-webhook
NAME ENDPOINTS AGE
cert-manager-webhook 10.244.0.8:10250 108m
The K-serve webhook issue was previously encountered in #2553. Should changes made in #2627 prevent reproducing this error? As for cert-manager webhook, #2585 had problem with no route to host while mine has with refused connection. It could be a kubernetes root level issue or deeper networking stack issue as in https://cert-manager.io/docs/troubleshooting/webhook/#cause-2-eks-on-a-custom-cni
kustomize version:
v5.3.0
My kubectl pods are:
[biswa@fedora manifests]$ sudo kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
auth dex-5d8fffb998-qq49q 1/1 Running 0 94m
cert-manager cert-manager-5b8f9b9d96-l7vj7 1/1 Running 0 94m
cert-manager cert-manager-cainjector-54f68bfb64-m6x5f 1/1 Running 0 94m
cert-manager cert-manager-webhook-f6c8487d6-9x6x4 1/1 Running 0 94m
istio-system cluster-local-gateway-7bd9cffcb5-thdkb 1/1 Running 0 94m
istio-system configure-kubernetes-oidc-issuer-jwks-in-requestauthenticasxnfl 0/1 Completed 0 94m
istio-system istio-ingressgateway-666f789ccb-wcqdc 1/1 Running 0 94m
istio-system istiod-6cd8c6c59c-htqzn 1/1 Running 0 94m
knative-eventing eventing-controller-688dc8df9f-9fxpp 1/1 Running 0 94m
knative-eventing eventing-webhook-8c6cc5bc7-789xh 1/1 Running 0 94m
knative-serving activator-55cd894f6c-dr9q4 1/1 Running 8 (36m ago) 94m
knative-serving autoscaler-76748895b9-shk8t 2/2 Running 0 56m
knative-serving controller-76dcf67d5-7tb5w 2/2 Running 0 56m
knative-serving domain-mapping-f5d4dbc56-pbz5q 2/2 Running 0 56m
knative-serving domainmapping-webhook-6f67684cd8-nlnsf 2/2 Running 0 55m
knative-serving net-istio-controller-7bb6fb5f58-tklxs 2/2 Running 0 55m
knative-serving net-istio-webhook-7d8476f6-svcjf 2/2 Running 0 55m
knative-serving webhook-d5cbdf855-bzmsx 2/2 Running 0 55m
kube-system coredns-565d847f94-cd9dp 1/1 Running 0 96m
kube-system coredns-565d847f94-lc62z 1/1 Running 0 96m
kube-system etcd-kubeflow-control-plane 1/1 Running 0 96m
kube-system kindnet-qzthr 1/1 Running 0 96m
kube-system kube-apiserver-kubeflow-control-plane 1/1 Running 0 96m
kube-system kube-controller-manager-kubeflow-control-plane 1/1 Running 0 96m
kube-system kube-proxy-9zct2 1/1 Running 0 96m
kube-system kube-scheduler-kubeflow-control-plane 1/1 Running 0 96m
kubeflow admission-webhook-deployment-6cf44ffbdb-5m86s 0/1 ContainerCreating 0 55m
kubeflow cache-server-7d94c87787-88m4h 0/2 Init:0/1 0 55m
kubeflow centraldashboard-965564b75-6frpk 2/2 Running 0 55m
kubeflow jupyter-web-app-deployment-757976b798-7ngdb 0/2 Pending 0 55m
kubeflow katib-controller-64bf8db8bd-nfn2k 0/1 ContainerCreating 0 55m
kubeflow katib-db-manager-6d6885765-tqldd 1/1 Running 7 (40m ago) 55m
kubeflow katib-mysql-db6dc68c-q7hbt 1/1 Running 0 55m
kubeflow katib-ui-64b8f8d78c-vxttm 2/2 Running 0 55m
kubeflow kserve-controller-manager-6df96f6d7c-wwxct 0/2 ContainerCreating 0 55m
kubeflow kserve-models-web-app-99849d9f7-rmfhk 2/2 Running 0 55m
kubeflow kubeflow-pipelines-profile-controller-59ccbd47b9-7875s 1/1 Running 0 55m
kubeflow metacontroller-0 1/1 Running 0 94m
kubeflow metadata-envoy-deployment-5cbbb86fc9-pwpbw 1/1 Running 0 55m
kubeflow metadata-grpc-deployment-784b8b5fb4-rqw94 1/2 CrashLoopBackOff 10 (49s ago) 55m
kubeflow metadata-writer-844bd5d486-nm2j6 2/2 Running 4 (69s ago) 55m
kubeflow minio-65dff76b66-brflp 0/2 Pending 0 55m
kubeflow ml-pipeline-6c7c86f666-qbs65 0/2 PodInitializing 0 55m
kubeflow ml-pipeline-persistenceagent-85c485f86f-j8qwx 0/2 PodInitializing 0 55m
kubeflow ml-pipeline-scheduledworkflow-6448c96f4f-98997 0/2 PodInitializing 0 55m
kubeflow ml-pipeline-ui-6db56c647b-b6ksz 0/2 Pending 0 55m
kubeflow ml-pipeline-viewer-crd-5df88b6956-kpt68 0/2 Pending 0 55m
kubeflow ml-pipeline-visualizationserver-6d49897f85-p9msj 0/2 Pending 0 55m
kubeflow mysql-c999c6c8-phg5s 0/2 Pending 0 55m
kubeflow notebook-controller-deployment-9ffdf65d7-bsn6b 0/2 PodInitializing 0 55m
kubeflow profiles-deployment-cbf679dbd-qwskr 0/3 PodInitializing 0 55m
kubeflow pvcviewer-controller-manager-d66667b49-mhn4n 0/3 Pending 0 55m
kubeflow tensorboard-controller-deployment-7444dc8fcd-gxvfr 0/3 Pending 0 55m
kubeflow tensorboards-web-app-deployment-78f7c694bf-tp8z9 0/2 Pending 0 55m
kubeflow training-operator-69575765df-v9hl4 1/1 Running 0 55m
kubeflow volumes-web-app-deployment-6dfccd897d-xklf7 0/2 Pending 0 55m
kubeflow workflow-controller-f65c9d9b4-m4f9k 0/2 PodInitializing 0 55m
local-path-storage local-path-provisioner-684f458cdd-nvs75 1/1 Running 0 96m
oauth2-proxy oauth2-proxy-58d95869bf-5n6l5 1/1 Running 0 94m
oauth2-proxy oauth2-proxy-58d95869bf-684pn 1/1 Running 0 94m
Can you try with the master branch as well? Please also check whether your install command is up to date in the master branch readme.md and follow the installation instructions with Kind as close as possible.
I was able to resolve this by increasing the resources allocated to the machine. Was getting capped out by CPU, maybe you're facing similar?
Can you try with the master branch as well? Please also check whether your install command is up to date in the master branch readme.md and follow the installation instructions with Kind as close as possible.
Hey @juliusvonkohout, yes my local machine's master branch is up to date.
@dnapier Hi, I tried to increase CPU resources in the --kubeconfig file but it says there is no resources field in v1alpha4.Node. Could you please tell me what you tried?
When I ran kubectl describe nodes
, the cpu resources were maxed out. This was being done in a VM, so I simply added more cores to the machine. If you're doing the same and the core speeds are being limited by the host, you could raise the limit as well, but that was not the case for me.
I encountered another issue following this which was the activator of knative-serving crashing, but I do not believe that is related to the error you're seeing here.
@dnapier Hi, I tried to increase CPU resources in the --kubeconfig file but it says there is no resources field in v1alpha4.Node. Could you please tell me what you tried?
CC @diegolovison then
Are you using kind with docker ?
Hello guys, I'm facing the same issues. I have to deploy Kubeflow for an Internship project and I have the same problem with Kubeflow v1.8 kustomize version : v5.3.0 cert-manager version : v0.12.1
After : "while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done" I get this error
Capture d'écran 2024-04-09 151931
My Kubernetes cluster is running with Tanzu.
Please just test with Kind as explained in the readme.md in the master branch, to make sure that it is not a Kubernetes issue of your own cluster.
Are you using kind with docker ?
Sorry, I didn't catch that this was addressed to me. Yes in my case, I am using kind with docker. Debian 12 host.
What is the amount of CPU and memory that you have available? Were you strictly following https://github.com/kubeflow/manifests/#installation
12GB of memory on the system, 8 core processor (Intel(R) Xeon(R) E5-2620).
And yes I was strictly following the installation instructions.
Please just test with Kind as explained in the readme.md in the master branch, to make sure that it is not a Kubernetes issue of your own cluster.
I already tested the v1.8 on minikube and I'm facing the same issue...
12GB of memory on the system, 8 core processor (Intel(R) Xeon(R) E5-2620).
I believe you will need to have more resources. I have 20 cores and 36GB of memory
minikube and I'm facing the same issue...
I wasn't able to make it work on Minikube. Only with kind
I've just attempted to install it using a local kind cluster, but it didn't work. I'm encountering another issue... ! issue-kind-kf
I've just attempted to install it using a local kind cluster, but it didn't work. I'm encountering another issue... ! issue-kind-kf
That's the exact issue I'm facing which @diegolovison is suggesting is caused from lack of available resources. I'm working on doubling my memory to 24GB to test if that resolves it. Will update asap.
Interesting.... I managed to install v1.8 on Minikube just now. I'm curious why it's working now. My suspicion is that I might encounter issues installing it on my Tanzu Cluster, perhaps due to a cluster-related problem.
Interesting.... I managed to install v1.8 on Minikube just now. I'm curious why it's working now. My suspicion is that I might encounter issues installing it on my Tanzu Cluster, perhaps due to a cluster-related problem.
Do you mind sharing your cpu/memory for comparison?
8 Cores/16G
minikube with podman worked for me with 16 GB if you strip down the example distribution down a bit. Otherwise you might need 32 GB. @diegolovison , we should add the memory and core requirements on top of the installation instructions with kind.
Do you believe that 32 GB and 20 cores?
Do you believe that 32 GB and 20 cores?
I do not understand your question.
should we document that 32 GB of RAM and 20 CPU cores are the minimal to install Kubeflow locally?
should we document that 32 GB of RAM and 20 CPU cores are the minimal to install Kubeflow locally?
Not that I have a say here, but I think that's a great idea.
I would go with 16 cores and 32 GB memory as recommendation. Or are you sure that 16 cores are not enough? It is possible to do with way less, but that is then left up to the end user.
Ok. Sounds good
@biswajit-9776 Please retry with the lastest master branch and readme. If you still encounter problems please open a new issue with our new template.