manifests
manifests copied to clipboard
Support for k8s v1.23
Hello! I'm using the single command install to get KF, but it keeps failing. I'm sure it's because I have k8s 1.23, which KF hasn't been tested with (afaik). Has someone succeeded in running this with newer k8s versions?
/kind feature /priority p1 /area installation
@ayush-1506 probably not, we are currently testing with 1.20 and 1.21. We have some tracking issues regarding 1.22 support, some of the individual WG have tested with 1.22 (Katib, KFP, training ops), but Notebooks and maybe central dashboard still need some dependency updates. which version of istio, cert mgr and knative are you using? Also are you using the new KF 1.5 RC 1?
@kubeflow/wg-manifests-leads fyi, this might be a good tracking issue for K8s 1.23
@jbottum: The label(s) area/installation
cannot be applied, because the repository doesn't have them.
In response to this:
/kind feature /priority p1 /area installation
@ayush-1506 probably not, we are currently testing with 1.20 and 1.21. We have some tracking issues regarding 1.22 support, some of the individual WG have tested with 1.22 (Katib, KFP, training ops), but Notebooks and maybe central dashboard still need some dependency updates. which version of istio, cert mgr and knative are you using? Also are you using the new KF 1.5 RC 1?
@kubeflow/wg-manifests-leads fyi, this might be a good tracking issue for K8s 1.23
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@jbottum Thanks for getting back! I'm on knative v0.22.1, cert-manager 1.7.1 and istio 1.11. I'm also on the master branch of this repo (is there some other tagged version I should use?)
Looking the the pods running in kubeflow namespace (even after failed installation command), I can see that kfserving-controller, metadata-writer, ml-pipeline, ml-pipeline-persistenceagent are stuck in CrashLoopBackoff state. Others seem to be running.
@jbottum I've downgraded k8s to 1.19:
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.16", GitCommit:"e37e4ab4cc8dcda84f1344dda47a97bb1927d074", GitTreeState:"clean", BuildDate:"2021-10-27T16:25:59Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.16", GitCommit:"e37e4ab4cc8dcda84f1344dda47a97bb1927d074", GitTreeState:"clean", BuildDate:"2021-10-27T16:20:18Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Which tag of this repo should I be using here? Using master and 1.4.0 fails.
@ayush-1506 Are you using: https://github.com/kubeflow/manifests/releases/tag/v1.4.1
Yes, getting this error message:
Error from server (Invalid): error when applying patch: [110/1886]
{"metadata":{"annotations":{"controller-gen.kubebuilder.io/version":null,"kubectl.kubernetes.io/last
-applied-configuration":"{\}
A long error message after this. Can attach if it helps.
And ends with
Error from server (InternalError): error when creating "STDIN": Internal error occurred: conversion
webhook for cert-manager.io/v1alpha2, Kind=Issuer failed: Post "https://cert-manager-webhook.cert-ma
nager.svc:443/convert?timeout=30s": dial tcp 10.106.158.211:443: connect: no route to host
Upon further investigation, I'm forced to wonder if this is because of some cni issue (or maybe firewall).
Ok, turns out it had something to do with my network configuration. v1.4.1 worked for me.
@ayush-1506 so you got the manifests to work with kubernetes 1.23
?
@dvaldivia No, didn't work with 1.23. Had to use 1.19 instead.
@ayush-1506 now that Kubeflow 1.6.0 is out, have you tried using Kubernetes 1.23 with this release?
I've tried Kubeflow 1.6.0 with Kubernetes 1.25 and I can confirm it doesn't work, a lot of deprecated stuff in use
K8s V1.22 is set to End of Life:2022-10-28 next month! Ref: k8s releases
To check version dependency maybe can spin up kind cluster using different node image from node-image on kind cluster using versions. I was able to test kubeflow v1.16.0 on k8s v1.22.9 on local, it is working fine.
yeah, 1.22
works just fine, I've also tested kubeflow 1.6.0
on that, but I think it's important to start pushing kubeflow to adopt 1.23 or newer, attempting to install on 1.25
yields errors related to deprecated APIs
resource mapping not found for name: "istio-ingressgateway" namespace: "istio-system" from "STDIN": no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"
please note that policy/v1beta1
was removed in 1.25
Running kubeflow 1.6.0
on k8s 1.23.6
- haven't run into any issues so far.
Running kubeflow
1.6.0
on k8s1.23.6
- haven't run into any issues so far.
do you have
CrashLoopBackOff
status pod ?
Running kubeflow
1.6.0
on k8s1.23.6
- haven't run into any issues so far.
do you have
CrashLoopBackOff
status pod ?
Some might have restarted during the start up phase, I think that has always been the case, but at most 2-3 times. Now all are running.
Running 1.6.0
on k8s 1.23.10
- Few pods are in CrashLoopBackOff.
Any help is appreciated. I'm stuck on this
NAMESPACE NAME READY STATUS RESTARTS AGE
kubeflow kserve-controller-manager-0 1/2 CrashLoopBackOff 8 (2m19s ago) 38m
Below is the error
**k logs -f kserve-controller-manager-0 -n kubeflow**
E1102 08:33:54.368344 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.InferenceService: failed to list *v1beta1.InferenceService: Internal error occurred: error resolving resource
E1102 08:34:36.365385 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.InferenceService: failed to list *v1beta1.InferenceService: Internal error occurred: error resolving resource
{"level":"error","ts":1667378114.599445,"logger":"controller.inferenceservice","msg":"Could not wait for Cache to sync","reconciler group":"serving.kserve.io","reconciler kind":"InferenceService","error":"failed to wait for inferenceservice caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234\nsigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:696"}
{"level":"info","ts":1667378114.5999308,"logger":"controller.trainedmodel","msg":"Shutdown signal received, waiting for all workers to finish","reconciler group":"serving.kserve.io","reconciler kind":"TrainedModel"}
Same error on 1.23 , @neelasha-09 How do you solve this problem
/close
There has been no activity for a long time. Please reopen if necessary.
@juliusvonkohout: Closing this issue.
In response to this:
/close
There has been no activity for a long time. Please reopen if necessary.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.