manifests icon indicating copy to clipboard operation
manifests copied to clipboard

Support for k8s v1.23

Open ayush-1506 opened this issue 3 years ago • 15 comments

Hello! I'm using the single command install to get KF, but it keeps failing. I'm sure it's because I have k8s 1.23, which KF hasn't been tested with (afaik). Has someone succeeded in running this with newer k8s versions?

ayush-1506 avatar Feb 18 '22 18:02 ayush-1506

/kind feature /priority p1 /area installation

@ayush-1506 probably not, we are currently testing with 1.20 and 1.21. We have some tracking issues regarding 1.22 support, some of the individual WG have tested with 1.22 (Katib, KFP, training ops), but Notebooks and maybe central dashboard still need some dependency updates. which version of istio, cert mgr and knative are you using? Also are you using the new KF 1.5 RC 1?

@kubeflow/wg-manifests-leads fyi, this might be a good tracking issue for K8s 1.23

jbottum avatar Feb 18 '22 22:02 jbottum

@jbottum: The label(s) area/installation cannot be applied, because the repository doesn't have them.

In response to this:

/kind feature /priority p1 /area installation

@ayush-1506 probably not, we are currently testing with 1.20 and 1.21. We have some tracking issues regarding 1.22 support, some of the individual WG have tested with 1.22 (Katib, KFP, training ops), but Notebooks and maybe central dashboard still need some dependency updates. which version of istio, cert mgr and knative are you using? Also are you using the new KF 1.5 RC 1?

@kubeflow/wg-manifests-leads fyi, this might be a good tracking issue for K8s 1.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Feb 18 '22 22:02 google-oss-prow[bot]

@jbottum Thanks for getting back! I'm on knative v0.22.1, cert-manager 1.7.1 and istio 1.11. I'm also on the master branch of this repo (is there some other tagged version I should use?)

Looking the the pods running in kubeflow namespace (even after failed installation command), I can see that kfserving-controller, metadata-writer, ml-pipeline, ml-pipeline-persistenceagent are stuck in CrashLoopBackoff state. Others seem to be running.

ayush-1506 avatar Feb 19 '22 03:02 ayush-1506

@jbottum I've downgraded k8s to 1.19:

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.16", GitCommit:"e37e4ab4cc8dcda84f1344dda47a97bb1927d074", GitTreeState:"clean", BuildDate:"2021-10-27T16:25:59Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.16", GitCommit:"e37e4ab4cc8dcda84f1344dda47a97bb1927d074", GitTreeState:"clean", BuildDate:"2021-10-27T16:20:18Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

Which tag of this repo should I be using here? Using master and 1.4.0 fails.

ayush-1506 avatar Feb 20 '22 18:02 ayush-1506

@ayush-1506 Are you using: https://github.com/kubeflow/manifests/releases/tag/v1.4.1

jbottum avatar Feb 21 '22 02:02 jbottum

Yes, getting this error message:

Error from server (Invalid): error when applying patch:                                   [110/1886]
{"metadata":{"annotations":{"controller-gen.kubebuilder.io/version":null,"kubectl.kubernetes.io/last
-applied-configuration":"{\}

A long error message after this. Can attach if it helps.

And ends with

Error from server (InternalError): error when creating "STDIN": Internal error occurred: conversion
webhook for cert-manager.io/v1alpha2, Kind=Issuer failed: Post "https://cert-manager-webhook.cert-ma
nager.svc:443/convert?timeout=30s": dial tcp 10.106.158.211:443: connect: no route to host

ayush-1506 avatar Feb 21 '22 02:02 ayush-1506

Upon further investigation, I'm forced to wonder if this is because of some cni issue (or maybe firewall).

ayush-1506 avatar Feb 21 '22 20:02 ayush-1506

Ok, turns out it had something to do with my network configuration. v1.4.1 worked for me.

ayush-1506 avatar Mar 09 '22 19:03 ayush-1506

@ayush-1506 so you got the manifests to work with kubernetes 1.23 ?

dvaldivia avatar Mar 29 '22 15:03 dvaldivia

@dvaldivia No, didn't work with 1.23. Had to use 1.19 instead.

ayush-1506 avatar Mar 29 '22 19:03 ayush-1506

@ayush-1506 now that Kubeflow 1.6.0 is out, have you tried using Kubernetes 1.23 with this release?

spots107 avatar Sep 26 '22 12:09 spots107

I've tried Kubeflow 1.6.0 with Kubernetes 1.25 and I can confirm it doesn't work, a lot of deprecated stuff in use

dvaldivia avatar Sep 26 '22 15:09 dvaldivia

K8s V1.22 is set to End of Life:2022-10-28 next month! Ref: k8s releases

To check version dependency maybe can spin up kind cluster using different node image from node-image on kind cluster using versions. I was able to test kubeflow v1.16.0 on k8s v1.22.9 on local, it is working fine.

Jeet-14 avatar Sep 26 '22 21:09 Jeet-14

yeah, 1.22 works just fine, I've also tested kubeflow 1.6.0 on that, but I think it's important to start pushing kubeflow to adopt 1.23 or newer, attempting to install on 1.25 yields errors related to deprecated APIs

resource mapping not found for name: "istio-ingressgateway" namespace: "istio-system" from "STDIN": no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"

please note that policy/v1beta1 was removed in 1.25

dvaldivia avatar Sep 26 '22 21:09 dvaldivia

Running kubeflow 1.6.0 on k8s 1.23.6 - haven't run into any issues so far.

Lejboelle avatar Sep 28 '22 07:09 Lejboelle

Running kubeflow 1.6.0 on k8s 1.23.6 - haven't run into any issues so far.

image do you have CrashLoopBackOff status pod ?

jaffe-fly avatar Oct 29 '22 02:10 jaffe-fly

Running kubeflow 1.6.0 on k8s 1.23.6 - haven't run into any issues so far.

image do you have CrashLoopBackOff status pod ?

Some might have restarted during the start up phase, I think that has always been the case, but at most 2-3 times. Now all are running.

Lejboelle avatar Nov 02 '22 08:11 Lejboelle

Running 1.6.0 on k8s 1.23.10 - Few pods are in CrashLoopBackOff. Any help is appreciated. I'm stuck on this

NAMESPACE             NAME                                                         READY   STATUS             RESTARTS         AGE
kubeflow              kserve-controller-manager-0                                  1/2     CrashLoopBackOff   8 (2m19s ago)    38m

Below is the error

 **k logs -f  kserve-controller-manager-0  -n kubeflow**
E1102 08:33:54.368344       1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.InferenceService: failed to list *v1beta1.InferenceService: Internal error occurred: error resolving resource
E1102 08:34:36.365385       1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.InferenceService: failed to list *v1beta1.InferenceService: Internal error occurred: error resolving resource
{"level":"error","ts":1667378114.599445,"logger":"controller.inferenceservice","msg":"Could not wait for Cache to sync","reconciler group":"serving.kserve.io","reconciler kind":"InferenceService","error":"failed to wait for inferenceservice caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234\nsigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:696"}
{"level":"info","ts":1667378114.5999308,"logger":"controller.trainedmodel","msg":"Shutdown signal received, waiting for all workers to finish","reconciler group":"serving.kserve.io","reconciler kind":"TrainedModel"}

neelasha-09 avatar Nov 02 '22 08:11 neelasha-09

Same error on 1.23 , @neelasha-09 How do you solve this problem

631068264 avatar Feb 06 '23 08:02 631068264

/close

There has been no activity for a long time. Please reopen if necessary.

juliusvonkohout avatar Aug 24 '23 16:08 juliusvonkohout

@juliusvonkohout: Closing this issue.

In response to this:

/close

There has been no activity for a long time. Please reopen if necessary.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Aug 24 '23 16:08 google-oss-prow[bot]