Yatai icon indicating copy to clipboard operation
Yatai copied to clipboard

Failed to run Yatai server in on-premise K8S

Open thechaos16 opened this issue 2 years ago • 6 comments

Hello, bentoML team.

I'm recently trying to use bentoML and Yatai on our on-premise K8S cluster, but somehow it failed because we don't have LB service on our cluster. Is there any guide or workarounds to deploy Yatai on non-cloud K8S?

Thank you.

Followings are a few error messages.

The error appears when I tried to push bento to yatai (yatai login is succeeded) 스크린샷 2022-06-07 오후 3 33 16

And I found that bentoml push queries to the pods naemd deployment-yatai-deployment-comp-operator under yatai-operator namespace, and it shows following error, and it shows there's no externalIP in yatai-ingress-controller-ingress-nginx-controller

2022-06-07T06:36:31.318Z	INFO	controller-runtime.manager.controller.deployment	getting Deployment ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.318Z	INFO	controller-runtime.manager.controller.deployment	Deployment getting successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.318Z	INFO	controller-runtime.manager.controller.deployment	creating namespace yatai-components ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.318Z	INFO	controller-runtime.manager.controller.deployment	namespace yatai-components creation successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.322Z	INFO	controller-runtime.manager.controller.deployment	Installing CertManagerComponent ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.322Z	INFO	controller-runtime.manager.controller.deployment	crd certificates.cert-manager.io already exists, so skipping install cert-manager	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.322Z	INFO	controller-runtime.manager.controller.deployment	Installed CertManagerComponent successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.325Z	INFO	controller-runtime.manager.controller.deployment	Installing YataiDeploymentOperatorComponent ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.326Z	INFO	controller-runtime.manager.controller.deployment	installing crd from file helm-charts/yatai-deployment-operator/crds/deployments.yaml ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.361Z	INFO	controller-runtime.manager.controller.deployment	crd bentodeployments.serving.yatai.ai updated successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.361Z	INFO	controller-runtime.manager.controller.deployment	getting helm release yatai ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.368Z	INFO	controller-runtime.manager.controller.deployment	found helm release yatai, status: deployed	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.369Z	INFO	controller-runtime.manager.controller.deployment	Installed YataiDeploymentOperatorComponent successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.373Z	INFO	controller-runtime.manager.controller.deployment	Installing CSIDriverImagePopulatorComponent ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.373Z	INFO	controller-runtime.manager.controller.deployment	getting helm release yatai-csi-driver-image-populator ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.376Z	INFO	controller-runtime.manager.controller.deployment	found helm release yatai-csi-driver-image-populator, status: deployed	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.377Z	INFO	controller-runtime.manager.controller.deployment	Installed CSIDriverImagePopulatorComponent successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.380Z	INFO	controller-runtime.manager.controller.deployment	Installing IngressControllerComponent ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.382Z	INFO	controller-runtime.manager.controller.deployment	getting helm release yatai-ingress-controller ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.390Z	INFO	controller-runtime.manager.controller.deployment	found helm release yatai-ingress-controller, status: failed	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.393Z	INFO	controller-runtime.manager.controller.deployment	Installed IngressControllerComponent successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.396Z	INFO	controller-runtime.manager.controller.deployment	Installing MinioComponent ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.396Z	INFO	controller-runtime.manager.controller.deployment	installing crd from file helm-charts/minio-operator/crds/minio.min.io_tenants.yaml ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.627Z	INFO	controller-runtime.manager.controller.deployment	crd tenants.minio.min.io updated successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.627Z	INFO	controller-runtime.manager.controller.deployment	getting helm release yatai-minio ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.639Z	INFO	controller-runtime.manager.controller.deployment	found helm release yatai-minio, status: failed	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.640Z	INFO	controller-runtime.manager.controller.deployment	getting ingress-controller service external ip...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.640Z	ERROR	controller-runtime.manager.controller.deployment	getting ingress-controller service external ip failed	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": "", "error": "the external ip of service yatai-ingress-controller-ingress-nginx-controller on namespace yatai-components is empty!", "errorVerbose": "the external ip of service yatai-ingress-controller-ingress-nginx-controller on namespace yatai-components is empty!\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*IngressControllerComponent).getIngressControllerServiceIps\n\t/workspace/controllers/deployment_controller.go:294\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*MinioComponent).Install\n\t/workspace/controllers/deployment_controller.go:510\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).doReconcile\n\t/workspace/controllers/deployment_controller.go:211\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).Reconcile\n\t/workspace/controllers/deployment_controller.go:126\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"}
github.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).doReconcile
	/workspace/controllers/deployment_controller.go:211
github.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).Reconcile
	/workspace/controllers/deployment_controller.go:126
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214
2022-06-07T06:36:31.641Z	ERROR	controller-runtime.manager.controller.deployment	Failed to install MinioComponent	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": "", "error": "the external ip of service yatai-ingress-controller-ingress-nginx-controller on namespace yatai-components is empty!", "errorVerbose": "the external ip of service yatai-ingress-controller-ingress-nginx-controller on namespace yatai-components is empty!\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*IngressControllerComponent).getIngressControllerServiceIps\n\t/workspace/controllers/deployment_controller.go:294\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*MinioComponent).Install\n\t/workspace/controllers/deployment_controller.go:510\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).doReconcile\n\t/workspace/controllers/deployment_controller.go:211\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).Reconcile\n\t/workspace/controllers/deployment_controller.go:126\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"}
github.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).Reconcile
	/workspace/controllers/deployment_controller.go:126
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214
2022-06-07T06:36:31.649Z	ERROR	controller-runtime.manager.controller.deployment	Reconciler error	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": "", "error": "the external ip of service yatai-ingress-controller-ingress-nginx-controller on namespace yatai-components is empty!", "errorVerbose": "the external ip of service yatai-ingress-controller-ingress-nginx-controller on namespace yatai-components is empty!\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*IngressControllerComponent).getIngressControllerServiceIps\n\t/workspace/controllers/deployment_controller.go:294\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*MinioComponent).Install\n\t/workspace/controllers/deployment_controller.go:510\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).doReconcile\n\t/workspace/controllers/deployment_controller.go:211\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).Reconcile\n\t/workspace/controllers/deployment_controller.go:126\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214

thechaos16 avatar Jun 08 '22 02:06 thechaos16

Hi @thechaos16! We don't see that case very often, but we've got a particular config that might help:

ingress:
  enabled: false

This will disable the creation of the ingress, which is for people who don't want to expose yatai with an external ip. Not sure all of your environment, but could you try that as a helm option?

timliubentoml avatar Jun 08 '22 21:06 timliubentoml

cc @yetone

yubozhao avatar Jun 09 '22 01:06 yubozhao

@thechaos16 Thanks for your report! Yatai deployment operators always need a load balancer, another solution is not to use the built-in Minio, but to manually specify the s3 configuration.

https://github.com/bentoml/Yatai/blob/main/docs/admin-guide.md#aws-s3

yetone avatar Jun 10 '22 18:06 yetone

Thank you for the quick reply.

@timliubentoml, I've tried to disable ingress from https://github.com/bentoml/yatai-chart/blob/main/values.yaml#L91, but it still shows the same error. I guess updating helm chart of yatai-chart cannot control operators' setup.

@yetone, I passed external S3 info by filling https://github.com/bentoml/yatai-chart/blob/main/values.yaml#L50-L58 blocks, but it still fails. Could you let me know if there is another way to not use the built-in Minio? In my K8S dashboard, there are two pods (minio-operator and yatai-minio-console) running.

thechaos16 avatar Jun 13 '22 01:06 thechaos16

@thechaos16 There is an error in the docs that shows setting ENDPOINT as https://s3.amazonaws.com but you need to actually set it to s3.amazonaws.com

artsparkAI avatar Jul 17 '22 23:07 artsparkAI

for me this was resolved after i deleted the default postgres pvc. the log comes out as no user postgres in yatai.

$ k logs pod/yatai-7f97bc87fb-qkc25  -n yatai-system
Error: migrate up db: cannot create migrate: pq: password authentication failed for user "postgres"

deleted the whole yatai, yatai postgresql

$ kubectl create secret generic yatai-postgresql  --from-literal=passwordExistingSecret=cqUIVv6S4q -n yatai-system

copied the initial secret and created a new postgresql secret. when i put existing secret with the new secret, it logins in as charm

values.yaml
postgresql:
  enabled: true
  nameOverride: ""
  postgresqlUsername: postgres
  postgresqlDatabase: yatai
  ## In case of postgresql.enabled = true, allow the usage of existing secrets for postgresql
  ##
  existingSecret: yatai-postgresql #""

i managed to run it with values.yaml. didnt work if i only change the values.yaml and updating it with argocd.

$ kubectl create secret generic yatai-ceph-secret --from-literal=accesskey=access-key --from-literal=secretkey=secret-key -n yatai-system
$ values.yaml
externalS3:
  enabled: true #false
  endpoint: '192.168.*.*9:300*1' #my ceph object storage endpoint(or minio)
  region: ''
  bucketName: 'hgkim'
  secure: false #true
  existingSecret: 'yatai-ceph-secret'
  existingSecretAccessKeyKey: 'accesskey' #'access_key'
  existingSecretSecretKeyKey: 'secretkey' #'secret_key'

after i do bentoml push it shows on the ui and object storage under bentoml/default

 bentoml push iris_classifier:latest
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Successfully pushed model "iris_clf:h7hjmrr276ld23vw"                                                                                                                           │
│ Successfully pushed Bento "iris_classifier:khydmnr276cwg3vw"                                                                                                                    │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Pushing Bento "iris_classifier:khydmnr276cwg3vw" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 5.8/5.8 kB • ? • 0:00:00
     Uploading model "iris_clf:h7hjmrr276ld23vw" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 2.0/2.0 kB • ? • 0:00:00

MightyTedKim avatar Sep 23 '22 05:09 MightyTedKim