kubeflow-manifests
kubeflow-manifests copied to clipboard
Cant access kubeflow dashboard after using "kubectl port-forward svc/istio-ingressgateway -n istio-system --address 0.0.0.0 8085:80"
I tried install kubeflow on aws with s3 storage by following tutorial from https://awslabs.github.io/kubeflow-manifests/docs/deployment/ Everything works well except the last step to access kubeflow dashboard: kubectl port-forward svc/istio-ingressgateway -n istio-system --address 0.0.0.0 8085:80
After using port-fowarding, I cant access http://localhost:8080/ This page gave me 403 error: You don't have authorization to view this page! How can I fix this?
@TranThanh96 Can you make sure your command is the same as the doc https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/#port-forward ?
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
Hey @TranThanh96, I responded to you on slack, can you additionally specify which deployment option you ran, was it the rds-s3?
@TranThanh96 Can you make sure your command is the same as the doc https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/#port-forward ?
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
ya, I tried both with and without --address 0.0.0.0 still cant access from browser
Hey @TranThanh96, I responded to you on slack, can you additionally specify which deployment option you ran, was it the rds-s3?
I use s3 only
after re-installing everything, I can reach the login page now
sounds good, verify you are able to login and run any samples you wish.
sounds good, verify you are able to login and run any samples you wish.
@ryansteakley I cant see any example pipelines in dashboard
and I cant create a new notebook server, error: 0/1 nodes are available : 1 too many pods log.txt
looks like you have several pods in crashloop backoff. Is your instance the same size or similar to the one described in https://awslabs.github.io/kubeflow-manifests/docs/deployment/prerequisites/ Did you follow the auto-setup python script?
run kubectl describe pod
@ryansteakley I follow guide install with s3 only. I am using 2 node: t3.xlarge now I have 3 pod that keep CrashLoopBackOff:
and this is log for each pod metadata-grpc-deployment-f8d68f687-pdzcx_describe.txt metadata-grpc-deployment-f8d68f687-pdzcx_log..txt metadata-writer-d7ff8d4bc-qqtjz_describe.txt metadata-writer-d7ff8d4bc-qqtjz_log.txt ml-pipeline-777648985d-jhkvl_describe.txt ml-pipeline-777648985d-jhkvl_log.txt
Warning Failed 34m (x5 over 34m) kubelet Error: secret "mlpipeline-minio-artifact" not found
in ml-pipeline logs. Can you check to see if this secret exists. Run kubectl get secrets -n kubeflow
secrets_kf_log.txt
seem like it exist
Can you verify that you are using v3.2.0 of kustomize? Run kubectl delete pods -n kubeflow --all
and see if the pods come up normally.
yes, I am using kustomize v3.2.0
Tried kubectl delete pods -n kubeflow --all but the pod metadata-grpc-deployment-f8d68f687-mqs82 keep crashloopbackoff
What do you see when you login? Are any other pods still failing?
everything is good except those 3 pods keep crashloopbackoff
and I get some errors on Pipelines and Runs, any suggestions please?
errors on Runs:
and these pods:
Can you verify that the s3-secret you created is following this requirement. Configure a Secret (e.g. s3-secret) with your AWS credentials. These need to be long-term credentials from an IAM user and not temporary.
yes, I can confirm that. How can I give you a evidence?
No way, to prove. Can you one more time describe the ml-pipeline pod. I would suggest restarting from a fresh cluster, and follow the cluster pre-req listed above.
Yes, this is 3rd times I re-install kubeflow on aws eks from a fresh cluster. and this error keep occurring
Sorry you are running into these problems, if you can please share the logs from the latest crashloopbackoff mlpipeline. Which version of AWS kubeflow are you running? I will try to reproduce your issue on my end and see if there is some underlying issue.
Sorry you are running into these problems, if you can please share the logs from the latest crashloopbackoff mlpipeline. Which version of AWS kubeflow are you running? I will try to reproduce your issue on my end and see if there is some underlying issue.
how can I get these log? I can provide it to you. I am using this version: KUBEFLOW_RELEASE_VERSION=v1.5.1 AWS_RELEASE_VERSION=v1.5.1-aws-b1.0.1
kubectl logs <ml-pipeline-pod> -n kubeflow
i see you are running 2 node: t3.xlarge, we reccomend a minimum of 5 nodes and m5.xlarge. Stated here https://awslabs.github.io/kubeflow-manifests/docs/deployment/prerequisites/ if you have time try to re-create following the suggested cluster create command
kubectl logs
-n kubeflow
This is log from ml-pipeline
@ryansteakley @TranThanh96 I think this is because of a bug related to missing mysql deployment in S3 only deployment option. It was fixed in main
branch recently but not backported to release branch https://github.com/awslabs/kubeflow-manifests/pull/310
@TranThanh96 Can you comment out this like - disable-mysql-pv-claim.yaml
in awsconfigs/apps/pipeline/s3/kustomization.yaml
and run
kustomize build awsconfigs/apps/pipeline/s3 | kubectl apply -f -
Please delete the pods which are in crashloopbackoff after doing this so that a new pod gets created
@ryansteakley @TranThanh96 I think this is because of a bug related to missing mysql deployment in S3 only deployment option. It was fixed in
main
branch recently but not backported to release branch #310@TranThanh96 Can you comment out this like
- disable-mysql-pv-claim.yaml
inawsconfigs/apps/pipeline/s3/kustomization.yaml
and run
kustomize build awsconfigs/apps/pipeline/s3 | kubectl apply -f -
Please delete the pods which are in crashloopbackoff after doing this so that a new pod gets created
yes, I try with rds + s3 deployment. everything works. So the problem is related to mysql
Thanks for reporting this issue. We have released a patch version (v1.5.1-aws-b1.0.2) to fix this issue