kubeflow-manifests
kubeflow-manifests copied to clipboard
全部成功后,却不能访问30000的情况
你好,今天出现了所有pod正常运行,但是30000端口访问不了的情况,请问该如何排查?
@kangzhang0709 30000端口访问不了是指什么错误?30000端口 istio 暴露到集群node的,我看你 service,应该可以通过 nodeIP:30000
访问到,如果不知道nodeIP 也可以通过 kubectl port-forward 直接将 80 映射到本地。
kubectl -nistio-system port-forward svc/istio-ingressgateway 8000:80
@kangzhang0709 30000端口访问不了是指什么错误?30000端口 istio 暴露到集群node的,我看你 service,应该可以通过
nodeIP:30000
访问到,如果不知道nodeIP 也可以通过 kubectl port-forward 直接将 80 映射到本地。kubectl -nistio-system port-forward svc/istio-ingressgateway 8000:80
:(,排查了半天并不知道哪里出错了。所有的pod都是正常runing状态。
@kangzhang0709 看看报错信息?:
curl -vvv -L <你的k8s节点ip地址>:30000
- About to connect() to 10.12.1.12 port 30000 (#0)
- Trying 10.12.1.12...
- Connected to 10.12.1.12 (10.12.1.12) port 30000 (#0)
GET / HTTP/1.1 User-Agent: curl/7.29.0 Host: 10.12.1.12:30000 Accept: /
< HTTP/1.1 403 Forbidden < date: Tue, 03 Aug 2021 03:11:30 GMT < server: istio-envoy < content-length: 0 <
- Connection #0 to host 10.12.1.12 left intact
我的也是,访问30000端口报403,经排查发现authservice-0容器报错如下,有大佬知道怎么回事吗? time="2021-08-03T03:31:00Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: Get http://dex.auth.svc.cluster.local:5556/dex/.well-known/openid-configuration: dial tcp: lookup dex.auth.svc.cluster.local on 169.254.25.10:53: no such host"
@ylylylylylyl 说明你的 dex 都没安装? kubectl get svc -nauth dex 看看是否存在?
- About to connect() to 10.12.1.12 port 30000 (#0)
- Trying 10.12.1.12...
- Connected to 10.12.1.12 (10.12.1.12) port 30000 (#0)
GET / HTTP/1.1 User-Agent: curl/7.29.0 Host: 10.12.1.12:30000 Accept: /
< HTTP/1.1 403 Forbidden < date: Tue, 03 Aug 2021 03:11:30 GMT < server: istio-envoy < content-length: 0 <
- Connection #0 to host 10.12.1.12 left intact
我的也是,访问30000端口报403,经排查发现authservice-0容器报错如下,有大佬知道怎么回事吗? time="2021-08-03T03:31:00Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: Get http://dex.auth.svc.cluster.local:5556/dex/.well-known/openid-configuration: dial tcp: lookup dex.auth.svc.cluster.local on 169.254.25.10:53: no such host"
后来我又卸载重装了一遍就好了。。
我安装过程没看到提示错误,但是服务没法完全起来,好多一直init,还有error, NAME READY STATUS RESTARTS AGE admission-webhook-deployment-6fb9d65887-q55ls 1/1 Running 0 24m cache-deployer-deployment-7558d65bf4-9blvv 0/2 PodInitializing 0 24m cache-server-67d98b4ddd-qlr7z 0/2 Init:0/1 0 24m centraldashboard-7b7676d8bd-22nws 1/1 Running 0 24m jupyter-web-app-deployment-66f74586d9-jcwlk 1/1 Running 0 24m katib-controller-77675c88df-4vcfz 1/1 Running 0 24m katib-db-manager-646695754f-889qq 0/1 Running 6 24m katib-mysql-5bb5bd9957-cb2xm 1/1 Running 0 24m katib-ui-55fd4bd6f9-l8882 1/1 Running 0 24m kfserving-controller-manager-0 0/2 ContainerCreating 0 22m kubeflow-pipelines-profile-controller-5698bf57cf-wqgfq 1/1 Running 0 24m metacontroller-0 1/1 Running 0 22m metadata-envoy-deployment-76d65977f7-7nmdg 1/1 Running 0 24m metadata-grpc-deployment-697d9c6c67-vqdhs 0/2 PodInitializing 0 24m metadata-writer-58cdd57678-7gdfd 0/2 PodInitializing 0 24m minio-6d6784db95-p8rkx 0/2 PodInitializing 0 24m ml-pipeline-85fc99f899-7pnt7 0/2 PodInitializing 0 24m ml-pipeline-persistenceagent-65cb9594c7-m5wfm 0/2 PodInitializing 0 24m ml-pipeline-scheduledworkflow-7f8d8dfc69-hpwgw 0/2 PodInitializing 0 24m ml-pipeline-ui-5c765cc7bd-w9tqv 0/2 PodInitializing 0 24m ml-pipeline-viewer-crd-5b8df7f458-c98pl 0/2 PodInitializing 0 24m ml-pipeline-visualizationserver-56c5ff68d5-gcnsd 0/2 PodInitializing 0 24m mpi-operator-789f88879-stq6m 0/1 Error 1 24m mxnet-operator-7fff864957-lq6zl 0/1 Error 0 24m mysql-56b554ff66-559zn 0/2 PodInitializing 0 24m notebook-controller-deployment-74d9584477-9mnqj 1/1 Running 0 24m profiles-deployment-67b4666796-lwnzm 0/2 ContainerCreating 0 24m pytorch-operator-fd86f7694-9j5bs 0/2 PodInitializing 0 24m tensorboard-controller-controller-manager-fd6bcffb4-fg2mv 0/3 PodInitializing 0 23m tensorboards-web-app-deployment-5465d687b9-v4n9m 1/1 Running 0 24m tf-job-operator-7bc5cf4cc7-7p298 0/1 CrashLoopBackOff 6 24m volumes-web-app-deployment-88db758b8-pdd44 1/1 Running 0 24m workflow-controller-84dcfc89c-hlbmn 2/2 Running 2 24m xgboost-operator-deployment-5c7bfd57cc-x2rxv 0/2 PodInitializing 0 24m
@kangzhang0709 看看报错信息?:
curl -vvv -L <你的k8s节点ip地址>:30000
- About to connect() to 115.126.115.204 port 30000 (#0)
- Trying 115.126.115.204...
- Connected to 115.126.115.204 (115.126.115.204) port 30000 (#0)
GET / HTTP/1.1 User-Agent: curl/7.29.0 Host: 115.126.115.204:30000 Accept: /
< HTTP/1.1 302 Found < content-type: text/html; charset=utf-8 < location: /dex/auth?client_id=kubeflow-oidc-authservice&redirect_uri=%2Flogin%2Foidc&response_type=code&scope=profile+email+groups+openid&state=MTY1NDY4ODE4MnxFd3dBRUZaMVV6bHFXamgxVm1Kb1ZqTjJRelU9fIK7NmOtIq0Hpn1ynmldVlqkzsYhghKrfAHRs6EcBMd3 < date: Wed, 08 Jun 2022 11:36:22 GMT < content-length: 269 < x-envoy-upstream-service-time: 22 < server: istio-envoy <
- Ignoring the response-body
- Connection #0 to host 115.126.115.204 left intact
- Issue another request to this URL: 'HTTP://115.126.115.204:30000/dex/auth?client_id=kubeflow-oidc-authservice&redirect_uri=%2Flogin%2Foidc&response_type=code&scope=profile+email+groups+openid&state=MTY1NDY4ODE4MnxFd3dBRUZaMVV6bHFXamgxVm1Kb1ZqTjJRelU9fIK7NmOtIq0Hpn1ynmldVlqkzsYhghKrfAHRs6EcBMd3'
- Found bundle for host 115.126.115.204: 0x1e5cec0
- Re-using existing connection! (#0) with host 115.126.115.204
- Connected to 115.126.115.204 (115.126.115.204) port 30000 (#0)
GET /dex/auth?client_id=kubeflow-oidc-authservice&redirect_uri=%2Flogin%2Foidc&response_type=code&scope=profile+email+groups+openid&state=MTY1NDY4ODE4MnxFd3dBRUZaMVV6bHFXamgxVm1Kb1ZqTjJRelU9fIK7NmOtIq0Hpn1ynmldVlqkzsYhghKrfAHRs6EcBMd3 HTTP/1.1 User-Agent: curl/7.29.0 Host: 115.126.115.204:30000 Accept: /
< HTTP/1.1 302 Found < content-type: text/html; charset=utf-8 < location: /dex/auth/local?req=gv5xr6fc3vmbsxx2pktiwew5u < date: Wed, 08 Jun 2022 11:36:22 GMT < content-length: 68 < x-envoy-upstream-service-time: 32 < server: istio-envoy <
- Ignoring the response-body
- Connection #0 to host 115.126.115.204 left intact
- Issue another request to this URL: 'HTTP://115.126.115.204:30000/dex/auth/local?req=gv5xr6fc3vmbsxx2pktiwew5u'
- Found bundle for host 115.126.115.204: 0x1e5cec0
- Re-using existing connection! (#0) with host 115.126.115.204
- Connected to 115.126.115.204 (115.126.115.204) port 30000 (#0)
GET /dex/auth/local?req=gv5xr6fc3vmbsxx2pktiwew5u HTTP/1.1 User-Agent: curl/7.29.0 Host: 115.126.115.204:30000 Accept: /
< HTTP/1.1 200 OK < date: Wed, 08 Jun 2022 11:36:22 GMT < content-length: 1497 < content-type: text/html; charset=utf-8 < x-envoy-upstream-service-time: 34 < server: istio-envoy <
<div class="dex-container">
Log in to Your Account
</div>
- Connection #0 to host 115.126.115.204 left intact
已收到您的来信,非常感谢!
你好,今天出现了所有pod正常运行,但是30000端口访问不了的情况,请问该如何排查?
![]()
telnet 一下端口是否放开
kubectl get svc -nauth dex
我也重装还是不行,返回403
已收到您的来信,非常感谢!
add env:
然后就可以访问30000端口了
然后再额外执行一下: kubectl apply -f patch/auth.yaml
里面记录着登陆的用户名和密码。
已收到您的来信,非常感谢!