kubeflow-manifests icon indicating copy to clipboard operation
kubeflow-manifests copied to clipboard

全部成功后,却不能访问30000的情况

Open Kang9779 opened this issue 3 years ago • 14 comments

你好,今天出现了所有pod正常运行,但是30000端口访问不了的情况,请问该如何排查? image image

Kang9779 avatar Jul 18 '21 11:07 Kang9779

@kangzhang0709 30000端口访问不了是指什么错误?30000端口 istio 暴露到集群node的,我看你 service,应该可以通过 nodeIP:30000 访问到,如果不知道nodeIP 也可以通过 kubectl port-forward 直接将 80 映射到本地。

kubectl -nistio-system port-forward svc/istio-ingressgateway 8000:80

shikanon avatar Jul 19 '21 12:07 shikanon

@kangzhang0709 30000端口访问不了是指什么错误?30000端口 istio 暴露到集群node的,我看你 service,应该可以通过 nodeIP:30000 访问到,如果不知道nodeIP 也可以通过 kubectl port-forward 直接将 80 映射到本地。

kubectl -nistio-system port-forward svc/istio-ingressgateway 8000:80

:(,排查了半天并不知道哪里出错了。所有的pod都是正常runing状态。

Kang9779 avatar Jul 23 '21 06:07 Kang9779

@kangzhang0709 看看报错信息?:

curl -vvv -L <你的k8s节点ip地址>:30000

shikanon avatar Jul 25 '21 13:07 shikanon

  • About to connect() to 10.12.1.12 port 30000 (#0)
  • Trying 10.12.1.12...
  • Connected to 10.12.1.12 (10.12.1.12) port 30000 (#0)

GET / HTTP/1.1 User-Agent: curl/7.29.0 Host: 10.12.1.12:30000 Accept: /

< HTTP/1.1 403 Forbidden < date: Tue, 03 Aug 2021 03:11:30 GMT < server: istio-envoy < content-length: 0 <

  • Connection #0 to host 10.12.1.12 left intact

我的也是,访问30000端口报403,经排查发现authservice-0容器报错如下,有大佬知道怎么回事吗? time="2021-08-03T03:31:00Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: Get http://dex.auth.svc.cluster.local:5556/dex/.well-known/openid-configuration: dial tcp: lookup dex.auth.svc.cluster.local on 169.254.25.10:53: no such host"

ylylylylylyl avatar Aug 03 '21 03:08 ylylylylylyl

@ylylylylylyl 说明你的 dex 都没安装? kubectl get svc -nauth dex 看看是否存在?

shikanon avatar Aug 03 '21 12:08 shikanon

  • About to connect() to 10.12.1.12 port 30000 (#0)
  • Trying 10.12.1.12...
  • Connected to 10.12.1.12 (10.12.1.12) port 30000 (#0)

GET / HTTP/1.1 User-Agent: curl/7.29.0 Host: 10.12.1.12:30000 Accept: /

< HTTP/1.1 403 Forbidden < date: Tue, 03 Aug 2021 03:11:30 GMT < server: istio-envoy < content-length: 0 <

  • Connection #0 to host 10.12.1.12 left intact

我的也是,访问30000端口报403,经排查发现authservice-0容器报错如下,有大佬知道怎么回事吗? time="2021-08-03T03:31:00Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: Get http://dex.auth.svc.cluster.local:5556/dex/.well-known/openid-configuration: dial tcp: lookup dex.auth.svc.cluster.local on 169.254.25.10:53: no such host"

后来我又卸载重装了一遍就好了。。

Kang9779 avatar Aug 04 '21 01:08 Kang9779

我安装过程没看到提示错误,但是服务没法完全起来,好多一直init,还有error, NAME READY STATUS RESTARTS AGE admission-webhook-deployment-6fb9d65887-q55ls 1/1 Running 0 24m cache-deployer-deployment-7558d65bf4-9blvv 0/2 PodInitializing 0 24m cache-server-67d98b4ddd-qlr7z 0/2 Init:0/1 0 24m centraldashboard-7b7676d8bd-22nws 1/1 Running 0 24m jupyter-web-app-deployment-66f74586d9-jcwlk 1/1 Running 0 24m katib-controller-77675c88df-4vcfz 1/1 Running 0 24m katib-db-manager-646695754f-889qq 0/1 Running 6 24m katib-mysql-5bb5bd9957-cb2xm 1/1 Running 0 24m katib-ui-55fd4bd6f9-l8882 1/1 Running 0 24m kfserving-controller-manager-0 0/2 ContainerCreating 0 22m kubeflow-pipelines-profile-controller-5698bf57cf-wqgfq 1/1 Running 0 24m metacontroller-0 1/1 Running 0 22m metadata-envoy-deployment-76d65977f7-7nmdg 1/1 Running 0 24m metadata-grpc-deployment-697d9c6c67-vqdhs 0/2 PodInitializing 0 24m metadata-writer-58cdd57678-7gdfd 0/2 PodInitializing 0 24m minio-6d6784db95-p8rkx 0/2 PodInitializing 0 24m ml-pipeline-85fc99f899-7pnt7 0/2 PodInitializing 0 24m ml-pipeline-persistenceagent-65cb9594c7-m5wfm 0/2 PodInitializing 0 24m ml-pipeline-scheduledworkflow-7f8d8dfc69-hpwgw 0/2 PodInitializing 0 24m ml-pipeline-ui-5c765cc7bd-w9tqv 0/2 PodInitializing 0 24m ml-pipeline-viewer-crd-5b8df7f458-c98pl 0/2 PodInitializing 0 24m ml-pipeline-visualizationserver-56c5ff68d5-gcnsd 0/2 PodInitializing 0 24m mpi-operator-789f88879-stq6m 0/1 Error 1 24m mxnet-operator-7fff864957-lq6zl 0/1 Error 0 24m mysql-56b554ff66-559zn 0/2 PodInitializing 0 24m notebook-controller-deployment-74d9584477-9mnqj 1/1 Running 0 24m profiles-deployment-67b4666796-lwnzm 0/2 ContainerCreating 0 24m pytorch-operator-fd86f7694-9j5bs 0/2 PodInitializing 0 24m tensorboard-controller-controller-manager-fd6bcffb4-fg2mv 0/3 PodInitializing 0 23m tensorboards-web-app-deployment-5465d687b9-v4n9m 1/1 Running 0 24m tf-job-operator-7bc5cf4cc7-7p298 0/1 CrashLoopBackOff 6 24m volumes-web-app-deployment-88db758b8-pdd44 1/1 Running 0 24m workflow-controller-84dcfc89c-hlbmn 2/2 Running 2 24m xgboost-operator-deployment-5c7bfd57cc-x2rxv 0/2 PodInitializing 0 24m

tianya092 avatar Nov 16 '21 02:11 tianya092

@kangzhang0709 看看报错信息?:

curl -vvv -L <你的k8s节点ip地址>:30000
  • About to connect() to 115.126.115.204 port 30000 (#0)
  • Trying 115.126.115.204...
  • Connected to 115.126.115.204 (115.126.115.204) port 30000 (#0)

GET / HTTP/1.1 User-Agent: curl/7.29.0 Host: 115.126.115.204:30000 Accept: /

< HTTP/1.1 302 Found < content-type: text/html; charset=utf-8 < location: /dex/auth?client_id=kubeflow-oidc-authservice&redirect_uri=%2Flogin%2Foidc&response_type=code&scope=profile+email+groups+openid&state=MTY1NDY4ODE4MnxFd3dBRUZaMVV6bHFXamgxVm1Kb1ZqTjJRelU9fIK7NmOtIq0Hpn1ynmldVlqkzsYhghKrfAHRs6EcBMd3 < date: Wed, 08 Jun 2022 11:36:22 GMT < content-length: 269 < x-envoy-upstream-service-time: 22 < server: istio-envoy <

  • Ignoring the response-body
  • Connection #0 to host 115.126.115.204 left intact
  • Issue another request to this URL: 'HTTP://115.126.115.204:30000/dex/auth?client_id=kubeflow-oidc-authservice&redirect_uri=%2Flogin%2Foidc&response_type=code&scope=profile+email+groups+openid&state=MTY1NDY4ODE4MnxFd3dBRUZaMVV6bHFXamgxVm1Kb1ZqTjJRelU9fIK7NmOtIq0Hpn1ynmldVlqkzsYhghKrfAHRs6EcBMd3'
  • Found bundle for host 115.126.115.204: 0x1e5cec0
  • Re-using existing connection! (#0) with host 115.126.115.204
  • Connected to 115.126.115.204 (115.126.115.204) port 30000 (#0)

GET /dex/auth?client_id=kubeflow-oidc-authservice&redirect_uri=%2Flogin%2Foidc&response_type=code&scope=profile+email+groups+openid&state=MTY1NDY4ODE4MnxFd3dBRUZaMVV6bHFXamgxVm1Kb1ZqTjJRelU9fIK7NmOtIq0Hpn1ynmldVlqkzsYhghKrfAHRs6EcBMd3 HTTP/1.1 User-Agent: curl/7.29.0 Host: 115.126.115.204:30000 Accept: /

< HTTP/1.1 302 Found < content-type: text/html; charset=utf-8 < location: /dex/auth/local?req=gv5xr6fc3vmbsxx2pktiwew5u < date: Wed, 08 Jun 2022 11:36:22 GMT < content-length: 68 < x-envoy-upstream-service-time: 32 < server: istio-envoy <

  • Ignoring the response-body
  • Connection #0 to host 115.126.115.204 left intact
  • Issue another request to this URL: 'HTTP://115.126.115.204:30000/dex/auth/local?req=gv5xr6fc3vmbsxx2pktiwew5u'
  • Found bundle for host 115.126.115.204: 0x1e5cec0
  • Re-using existing connection! (#0) with host 115.126.115.204
  • Connected to 115.126.115.204 (115.126.115.204) port 30000 (#0)

GET /dex/auth/local?req=gv5xr6fc3vmbsxx2pktiwew5u HTTP/1.1 User-Agent: curl/7.29.0 Host: 115.126.115.204:30000 Accept: /

< HTTP/1.1 200 OK < date: Wed, 08 Jun 2022 11:36:22 GMT < content-length: 1497 < content-type: text/html; charset=utf-8 < x-envoy-upstream-service-time: 34 < server: istio-envoy <

dex
<div class="dex-container">

Log in to Your Account

<button tabindex="3" id="submit-login" type="submit" class="dex-btn theme-btn--primary">Login</button>
</div>
  • Connection #0 to host 115.126.115.204 left intact

Chenxs1122 avatar Jun 08 '22 11:06 Chenxs1122

已收到您的来信,非常感谢!

tianya092 avatar Jun 08 '22 11:06 tianya092

你好,今天出现了所有pod正常运行,但是30000端口访问不了的情况,请问该如何排查? image image

telnet 一下端口是否放开

Chenxs1122 avatar Jun 09 '22 00:06 Chenxs1122

kubectl get svc -nauth dex

我也重装还是不行,返回403

hecheng64 avatar Aug 11 '22 05:08 hecheng64

已收到您的来信,非常感谢!

tianya092 avatar Aug 11 '22 05:08 tianya092

image image image

add env: image

image

然后就可以访问30000端口了

然后再额外执行一下: kubectl apply -f patch/auth.yaml

里面记录着登陆的用户名和密码。

xytsinghua avatar Jun 14 '24 01:06 xytsinghua

已收到您的来信,非常感谢!

tianya092 avatar Jun 14 '24 01:06 tianya092