kubeflow-manifests
kubeflow-manifests copied to clipboard
authservice-0 not ready导致403报错
问题简述
您好,我使用 kubeflow 官方manifests 和您构建的manifests时 authservice-0 都出现了同样的 not ready的问题kubeflow/manifests/issue ,这导致我无法访问kubeflow 面板。
2021-10-24T03:21:48.287901796+08:00 time="2021-10-23T19:21:48Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: Get http://dex.auth.svc.cluster.local:5556/dex/.well-known/openid-configuration: EOF"
请问出现这种问题的原因是什么?方便分享一下您是如何配置 dex\istio 来实现 HTTP 访问所有服务的思路和原理呢?
官方 manifests 说明文档中使用HTTP访问所有服务的方法为需要手动设置环境变量,参见 连接kf集群
我已经尝试了的方法
- [x] 重装和多次安装patch
- [x] kubectl port-forward
- [x] 排除coreDNS解析故障
我的Kubernetes环境
无公网IP的 Kubernetes v1.20 集群
集群状态
/ # kubectl get pod -A
# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
auth dex-6d8cd4fccb-s4clw 1/1 Running 0 12m
cert-manager cert-manager-649f8dfd4b-86qx2 1/1 Running 0 45m
cert-manager cert-manager-cainjector-75cd8bbf6d-hq6w2 1/1 Running 0 45m
cert-manager cert-manager-webhook-5b5cd9bd6f-c7gtk 1/1 Running 0 45m
ingress-nginx ingress-nginx-admission-create-lqqlk 0/1 Completed 0 4d5h
ingress-nginx ingress-nginx-admission-patch-w9sbl 0/1 Completed 0 4d5h
ingress-nginx ingress-nginx-controller-686f6b6867-bzztx 1/1 Running 0 3d22h
istio-system authservice-0 0/1 Running 0 29m
istio-system cluster-local-gateway-74d9fd9586-kxhjz 1/1 Running 0 28m
istio-system istio-ingressgateway-8bf685655-mgrf7 1/1 Running 0 28m
istio-system istiod-756554b96b-6slfc 1/1 Running 0 28m
knative-eventing broker-controller-cfb5ccb77-dp4b7 1/1 Running 0 44m
knative-eventing eventing-controller-8657cd4b8-sfh9t 1/1 Running 0 44m
knative-eventing eventing-webhook-67f86f4d4d-wl49w 1/1 Running 0 44m
knative-eventing imc-controller-68bd666784-rkvpk 1/1 Running 0 44m
knative-eventing imc-dispatcher-78ff9dd847-7kmmp 1/1 Running 0 44m
knative-serving activator-54b777546f-6v7x9 0/1 CrashLoopBackOff 15 44m
knative-serving autoscaler-79bbc84d47-9g2b4 1/1 Running 0 44m
knative-serving controller-dd65cb4b7-88m86 1/1 Running 0 44m
knative-serving istio-webhook-5f545fc44b-zlmrb 1/1 Running 0 44m
knative-serving networking-istio-6b6df495d6-zkjgj 1/1 Running 0 44m
knative-serving webhook-9ff656f95-bd2fh 1/1 Running 0 44m
kube-system coredns-8496bbfb78-52c27 1/1 Running 0 4d5h
kube-system coredns-8496bbfb78-ngp9h 1/1 Running 0 4d5h
kube-system default-http-backend-6946487d9b-9s5sp 1/1 Running 0 4d5h
kube-system etcd-k8s-master-node1 1/1 Running 0 4d6h
kube-system etcd-snapshot-1634651059-86qzb 0/1 Completed 0 4d5h
kube-system etcd-snapshot-1634961600-drknl 0/1 Completed 0 15h
kube-system etcd-snapshot-1634983200-r48jf 0/1 Completed 0 9h
kube-system etcd-snapshot-1635004800-8chsx 0/1 Completed 0 3h29m
kube-system kube-apiserver-k8s-master-node1 1/1 Running 0 4d6h
kube-system kube-controller-manager-k8s-master-node1 1/1 Running 0 4d6h
kube-system kube-flannel-ds-jp2w5 1/1 Running 0 4d6h
kube-system kube-flannel-ds-jtdxd 1/1 Running 0 4d6h
kube-system kube-flannel-ds-r5n9v 1/1 Running 0 4d6h
kube-system kube-flannel-ds-t29zs 1/1 Running 0 4d6h
kube-system kube-flannel-ds-xl42q 1/1 Running 0 4d6h
kube-system kube-proxy-jdvsj 1/1 Running 0 3d22h
kube-system kube-proxy-jlzgp 1/1 Running 0 3d22h
kube-system kube-proxy-nvj9n 1/1 Running 0 3d22h
kube-system kube-proxy-qmg4d 1/1 Running 0 3d22h
kube-system kube-proxy-tbvj9 1/1 Running 0 3d22h
kube-system kube-scheduler-k8s-master-node1 1/1 Running 0 4d6h
kube-system metrics-server-57bcd9bccd-cd24c 1/1 Running 0 4d
kube-system snapshot-controller-0 1/1 Running 0 4d
kubeflow-user-example-com ml-pipeline-ui-artifact-6b9bb7f495-5vtnw 2/2 Running 0 12m
kubeflow-user-example-com ml-pipeline-visualizationserver-5c648f8448-jdqll 2/2 Running 0 12m
kubeflow admission-webhook-deployment-5f5cc7968b-hjqkc 1/1 Running 0 38m
kubeflow cache-deployer-deployment-64598b6c87-h9xz6 2/2 Running 1 39m
kubeflow cache-server-59d67c7584-9gbkd 2/2 Running 0 25m
kubeflow centraldashboard-7b6b6cc7fc-g86hd 1/1 Running 0 38m
kubeflow jupyter-web-app-deployment-7c6974bb88-djnch 1/1 Running 0 25m
kubeflow katib-controller-7b784c44dd-9z6qp 1/1 Running 0 38m
kubeflow katib-db-manager-6c5757dc64-8z45w 1/1 Running 0 38m
kubeflow katib-mysql-79d75c7444-q7xj4 1/1 Running 0 38m
kubeflow katib-ui-69f5b6795d-6xtth 1/1 Running 0 38m
kubeflow kfserving-controller-manager-0 2/2 Running 0 39m
kubeflow kubeflow-pipelines-profile-controller-76c45c8c6b-tfzjn 1/1 Running 0 25m
kubeflow metacontroller-0 1/1 Running 0 39m
kubeflow metadata-envoy-deployment-56f745f7fb-xpgj9 1/1 Running 0 39m
kubeflow metadata-grpc-deployment-6494577fdb-rrdjw 2/2 Running 2 39m
kubeflow metadata-writer-b7ff9787-rglsg 2/2 Running 0 39m
kubeflow minio-cc8f7c6d-r6m2g 2/2 Running 0 25m
kubeflow ml-pipeline-66bcb9d79d-nfxkt 2/2 Running 0 39m
kubeflow ml-pipeline-persistenceagent-7fb8f6dc68-pzmdq 2/2 Running 0 39m
kubeflow ml-pipeline-scheduledworkflow-64bcfd6596-h57hp 2/2 Running 0 39m
kubeflow ml-pipeline-ui-8578f6685f-2mmnq 2/2 Running 0 38m
kubeflow ml-pipeline-viewer-crd-565fb9b5c5-cf9sc 2/2 Running 1 38m
kubeflow ml-pipeline-visualizationserver-b7c7d49fb-6vvrr 2/2 Running 0 38m
kubeflow mpi-operator-794849c566-5dssr 1/1 Running 0 38m
kubeflow mxnet-operator-6668d797d4-lk7m7 1/1 Running 0 38m
kubeflow mysql-c8d548489-j24z2 2/2 Running 0 25m
kubeflow notebook-controller-deployment-6795dd887b-95wlk 1/1 Running 0 38m
kubeflow profiles-deployment-84bd4f9bc7-lq2nk 2/2 Running 0 38m
kubeflow pytorch-operator-6887749499-p2rvr 2/2 Running 0 38m
kubeflow tensorboard-controller-controller-manager-dd896c8df-xn2bj 3/3 Running 1 38m
kubeflow tensorboards-web-app-deployment-5969cd5b68-6khtv 1/1 Running 0 25m
kubeflow tf-job-operator-ccb48b77b-rbsgz 1/1 Running 0 38m
kubeflow volumes-web-app-deployment-867dfb5b5c-lnxfm 1/1 Running 0 25m
kubeflow workflow-controller-6885c56f65-fjwh5 2/2 Running 1 25m
kubeflow xgboost-operator-deployment-665cf9bf8d-gw4cv 2/2 Running 2 38m
使用Kubernets的DNS调试工具对coreDNS插件进行调试,结果显示DNS运行正常。
$ kubectl exec -i -t dnsutils -- nslookup dex.auth
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: dex.auth.svc.cluster.local
Address: 10.96.213.43
$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
[INFO] 10.244.3.22:47994 - 27317 "AAAA IN metadata-grpc-service.svc.cluster.local. udp 57 false 512" NXDOMAIN qr,aa,rd 150 0.000028399s
[INFO] 10.244.3.22:47994 - 36217 "AAAA IN metadata-grpc-service.cluster.local. udp 53 false 512" NXDOMAIN qr,aa,rd 146 0.000024183s
[INFO] 10.244.3.22:47994 - 10155 "AAAA IN metadata-grpc-service.mydomain. udp 48 false 512" NOERROR qr,aa,rd,ra 48 0.000416245s
[INFO] 10.244.3.22:47994 - 31310 "AAAA IN metadata-grpc-service.otherdomain. udp 51 false 512" NOERROR qr,aa,rd,ra 51 0.000266715s
[INFO] 10.244.3.22:47994 - 49256 "AAAA IN metadata-grpc-service. udp 39 false 512" NOERROR qr,aa,rd,ra 39 0.000290465s
[INFO] 10.244.3.22:47994 - 36740 "A IN metadata-grpc-service.kubeflow.svc.cluster.local. udp 66 false 512" NOERROR qr,aa,rd 130 0.000029962s
[INFO] 10.244.2.15:46399 - 61341 "AAAA IN dex.auth.svc.cluster.local.istio-system.svc.cluster.local. udp 75 false 512" NXDOMAIN qr,aa,rd 168 0.000213123s
[INFO] 10.244.2.15:58084 - 42770 "AAAA IN dex.auth.svc.cluster.local.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd 155 0.000366989s
[INFO] 10.244.2.15:38095 - 56024 "AAAA IN dex.auth.svc.cluster.local.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd 151 0.00015846s
[INFO] 10.244.2.15:46342 - 61765 "A IN dex.auth.svc.cluster.local.mydomain. udp 53 false 512" NOERROR qr,aa,rd,ra 104 0.000732533s
[INFO] 10.244.2.15:59385 - 40897 "A IN dex.auth.svc.cluster.local.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd 155 0.000132883s
[INFO] 10.244.2.15:33994 - 15480 "A IN dex.auth.svc.cluster.local.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd 151 0.000108848s
[INFO] 10.244.2.15:45563 - 11457 "AAAA IN dex.auth.svc.cluster.local.mydomain. udp 53 false 512" NOERROR qr,aa,rd,ra 53 0.000486252s
[INFO] 10.244.3.22:58580 - 55075 "AAAA IN metadata-grpc-service.kubeflow.svc.cluster.local. udp 66 false 512" NOERROR qr,aa,rd 159 0.00012731s
[INFO] 10.244.3.22:58580 - 42214 "AAAA IN metadata-grpc-service.svc.cluster.local. udp 57 false 512" NXDOMAIN qr,aa,rd 150 0.000127127s
[INFO] 10.244.3.22:58580 - 41511 "AAAA IN metadata-grpc-service.cluster.local. udp 53 false 512" NXDOMAIN qr,aa,rd 146 0.000087901s
[INFO] 10.244.3.22:58580 - 48277 "AAAA IN metadata-grpc-service.mydomain. udp 48 false 512" NOERROR qr,aa,rd,ra 48 0.000446359s
[INFO] 10.244.3.22:58580 - 49236 "AAAA IN metadata-grpc-service.otherdomain. udp 51 false 512" NOERROR qr,aa,rd,ra 51 0.000329652s
[INFO] 10.244.3.22:58580 - 19522 "AAAA IN metadata-grpc-service. udp 39 false 512" NOERROR qr,aa,rd,ra 39 0.000242273s
[INFO] 10.244.3.22:58580 - 34650 "A IN metadata-grpc-service.kubeflow.svc.cluster.local. udp 66 false 512" NOERROR qr,aa,rd 130 0.000101973s
.....
@TaibiaoGuo 看你的kubectl get pod -A
的输出结果, auth 是 running的,出问题的应该是knative 中 activator 这个服务,如果你用我的 manifest 配合 kind 安装,只需要按照 readme 访问 istio svc 的node port端口。
dex 的鉴权是 overload 在 istio 的,可以看这个文件:
https://github.com/shikanon/kubeflow-manifests/blob/master/manifest1.3/008-dex-overlays-istio.yaml
我也是所有的服务都是running 就activator 和authservice 是no ready状态,查看了一下日志,分别为
1 。Websocket connection could not be established
{"level":"info","ts":"2021-11-16T08:06:34.781Z","logger":"activator","caller":"metrics/prometheus_exporter.go:37","msg":"Created Opencensus Prometheus exporter with config: &{knative.dev/internal/serving activator prometheus 5000000000
2,。 OIDC provider setup failed time="2021-11-16T08:54:00Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: Get http://dex.auth.svc.cluster.local:5556/dex/.well-known/openid-configuration: dial tcp 170.33.9.230:5556: i/o timeout"
@TaibiaoGuo hi,我也碰到了这个auth问题,请问,您是怎么解决的?
solve it by configure Persistent Volumes provisioner for k8s
I also encountered this problem. Has it been solved?
已收到您的来信,非常感谢!
请问问题解决了吗
已收到您的来信,非常感谢!