kubeflow-manifests icon indicating copy to clipboard operation
kubeflow-manifests copied to clipboard

Cant access kubeflow dashboard after using "kubectl port-forward svc/istio-ingressgateway -n istio-system --address 0.0.0.0 8085:80"

Open TranThanh96 opened this issue 2 years ago • 26 comments

I tried install kubeflow on aws with s3 storage by following tutorial from https://awslabs.github.io/kubeflow-manifests/docs/deployment/ Everything works well except the last step to access kubeflow dashboard: kubectl port-forward svc/istio-ingressgateway -n istio-system --address 0.0.0.0 8085:80

After using port-fowarding, I cant access http://localhost:8080/ This page gave me 403 error: You don't have authorization to view this page! How can I fix this?

TranThanh96 avatar Aug 23 '22 07:08 TranThanh96

@TranThanh96 Can you make sure your command is the same as the doc https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/#port-forward ?

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

AlexandreBrown avatar Aug 23 '22 12:08 AlexandreBrown

Hey @TranThanh96, I responded to you on slack, can you additionally specify which deployment option you ran, was it the rds-s3?

ryansteakley avatar Aug 23 '22 17:08 ryansteakley

@TranThanh96 Can you make sure your command is the same as the doc https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/#port-forward ?

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

ya, I tried both with and without --address 0.0.0.0 still cant access from browser

TranThanh96 avatar Aug 24 '22 02:08 TranThanh96

Hey @TranThanh96, I responded to you on slack, can you additionally specify which deployment option you ran, was it the rds-s3?

I use s3 only

TranThanh96 avatar Aug 24 '22 02:08 TranThanh96

after re-installing everything, I can reach the login page now

TranThanh96 avatar Aug 24 '22 04:08 TranThanh96

sounds good, verify you are able to login and run any samples you wish.

ryansteakley avatar Aug 24 '22 04:08 ryansteakley

sounds good, verify you are able to login and run any samples you wish.

@ryansteakley I cant see any example pipelines in dashboard image

and I cant create a new notebook server, error: 0/1 nodes are available : 1 too many pods log.txt

TranThanh96 avatar Aug 24 '22 05:08 TranThanh96

looks like you have several pods in crashloop backoff. Is your instance the same size or similar to the one described in https://awslabs.github.io/kubeflow-manifests/docs/deployment/prerequisites/ Did you follow the auto-setup python script?

ryansteakley avatar Aug 24 '22 05:08 ryansteakley

run kubectl describe pod -n and similarily kubectl logs -n on the pods in failure state. and share anything you find there as well

ryansteakley avatar Aug 24 '22 05:08 ryansteakley

Warning Failed 34m (x5 over 34m) kubelet Error: secret "mlpipeline-minio-artifact" not found in ml-pipeline logs. Can you check to see if this secret exists. Run kubectl get secrets -n kubeflow

ryansteakley avatar Aug 24 '22 08:08 ryansteakley

secrets_kf_log.txt seem like it exist image

TranThanh96 avatar Aug 24 '22 08:08 TranThanh96

Can you verify that you are using v3.2.0 of kustomize? Run kubectl delete pods -n kubeflow --all and see if the pods come up normally.

ryansteakley avatar Aug 24 '22 09:08 ryansteakley

yes, I am using kustomize v3.2.0 image

Tried kubectl delete pods -n kubeflow --all but the pod metadata-grpc-deployment-f8d68f687-mqs82 keep crashloopbackoff

image

TranThanh96 avatar Aug 24 '22 09:08 TranThanh96

What do you see when you login? Are any other pods still failing?

ryansteakley avatar Aug 24 '22 09:08 ryansteakley

everything is good except those 3 pods keep crashloopbackoff image

and I get some errors on Pipelines and Runs, any suggestions please?

image image

errors on Runs: image image

and these pods: image

TranThanh96 avatar Aug 24 '22 09:08 TranThanh96

Can you verify that the s3-secret you created is following this requirement. Configure a Secret (e.g. s3-secret) with your AWS credentials. These need to be long-term credentials from an IAM user and not temporary.

ryansteakley avatar Aug 24 '22 09:08 ryansteakley

yes, I can confirm that. How can I give you a evidence?

TranThanh96 avatar Aug 24 '22 09:08 TranThanh96

No way, to prove. Can you one more time describe the ml-pipeline pod. I would suggest restarting from a fresh cluster, and follow the cluster pre-req listed above.

ryansteakley avatar Aug 24 '22 09:08 ryansteakley

Yes, this is 3rd times I re-install kubeflow on aws eks from a fresh cluster. and this error keep occurring

TranThanh96 avatar Aug 24 '22 09:08 TranThanh96

Sorry you are running into these problems, if you can please share the logs from the latest crashloopbackoff mlpipeline. Which version of AWS kubeflow are you running? I will try to reproduce your issue on my end and see if there is some underlying issue.

ryansteakley avatar Aug 24 '22 10:08 ryansteakley

Sorry you are running into these problems, if you can please share the logs from the latest crashloopbackoff mlpipeline. Which version of AWS kubeflow are you running? I will try to reproduce your issue on my end and see if there is some underlying issue.

how can I get these log? I can provide it to you. I am using this version: KUBEFLOW_RELEASE_VERSION=v1.5.1 AWS_RELEASE_VERSION=v1.5.1-aws-b1.0.1

TranThanh96 avatar Aug 24 '22 10:08 TranThanh96

kubectl logs <ml-pipeline-pod> -n kubeflow i see you are running 2 node: t3.xlarge, we reccomend a minimum of 5 nodes and m5.xlarge. Stated here https://awslabs.github.io/kubeflow-manifests/docs/deployment/prerequisites/ if you have time try to re-create following the suggested cluster create command

ryansteakley avatar Aug 24 '22 10:08 ryansteakley

kubectl logs -n kubeflow

image This is log from ml-pipeline

TranThanh96 avatar Aug 24 '22 10:08 TranThanh96

@ryansteakley @TranThanh96 I think this is because of a bug related to missing mysql deployment in S3 only deployment option. It was fixed in main branch recently but not backported to release branch https://github.com/awslabs/kubeflow-manifests/pull/310

@TranThanh96 Can you comment out this like - disable-mysql-pv-claim.yaml in awsconfigs/apps/pipeline/s3/kustomization.yaml and run

kustomize build awsconfigs/apps/pipeline/s3 | kubectl apply -f -

Please delete the pods which are in crashloopbackoff after doing this so that a new pod gets created

surajkota avatar Aug 25 '22 21:08 surajkota

@ryansteakley @TranThanh96 I think this is because of a bug related to missing mysql deployment in S3 only deployment option. It was fixed in main branch recently but not backported to release branch #310

@TranThanh96 Can you comment out this like - disable-mysql-pv-claim.yaml in awsconfigs/apps/pipeline/s3/kustomization.yaml and run

kustomize build awsconfigs/apps/pipeline/s3 | kubectl apply -f -

Please delete the pods which are in crashloopbackoff after doing this so that a new pod gets created

yes, I try with rds + s3 deployment. everything works. So the problem is related to mysql

TranThanh96 avatar Aug 26 '22 02:08 TranThanh96

Thanks for reporting this issue. We have released a patch version (v1.5.1-aws-b1.0.2) to fix this issue

surajkota avatar Sep 25 '22 00:09 surajkota