cloud-provider-openstack icon indicating copy to clipboard operation
cloud-provider-openstack copied to clipboard

Only a Master Node is Getting EXTERNAL-IP Not worker nodes?

Open nashford77 opened this issue 1 year ago • 7 comments

root@5net-k8s-master-0:~# kubectl get nodes -A -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME 5net-k8s-master-0 Ready control-plane,master 23h v1.30.1 10.5.1.36 192.168.5.75 Ubuntu 22.04.4 LTS 5.15.0-112-generic docker://26.1.4 5net-k8s-node-0 Ready worker 23h v1.30.1 10.5.1.121 Ubuntu 22.04.4 LTS 5.15.0-112-generic docker://26.1.4 5net-k8s-node-1 Ready worker 23h v1.30.1 10.5.1.55 Ubuntu 22.04.4 LTS 5.15.0-112-generic docker://26.1.4 5net-k8s-node-2 Ready worker 23h v1.30.1 10.5.1.45 Ubuntu 22.04.4 LTS 5.15.0-112-generic docker://26.1.4

I saw this earlier:

root@5net-k8s-master-0:~# kubectl logs -n kube-system -l k8s-app=openstack-cloud-controller-manager I0608 09:19:59.531119 10 controllermanager.go:319] Starting "service-lb-controller" I0608 09:19:59.531235 10 node_lifecycle_controller.go:113] Sending events to api server I0608 09:19:59.531576 10 openstack.go:385] Claiming to support LoadBalancer I0608 09:19:59.531722 10 controllermanager.go:338] Started "service-lb-controller" I0608 09:19:59.531863 10 controller.go:231] Starting service controller I0608 09:19:59.531964 10 shared_informer.go:313] Waiting for caches to sync for service I0608 09:19:59.631182 10 node_controller.go:425] Initializing node 5net-k8s-master-0 with cloud provider I0608 09:19:59.632722 10 shared_informer.go:320] Caches are synced for service I0608 09:20:00.346484 10 node_controller.go:492] Successfully initialized node 5net-k8s-master-0 with cloud provider I0608 09:20:00.346746 10 event.go:389] "Event occurred" object="5net-k8s-master-0" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="Synced" message="Node synced successfully"

I restarted it thinking this may register the other nodes, no go...

root@5net-k8s-master-0:~# kubectl logs -f -n kube-system -l k8s-app=openstack-cloud-controller-manager I0609 08:52:08.275473 10 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I0609 08:52:08.275485 10 shared_informer.go:313] Waiting for caches to sync for RequestHeaderAuthRequestController I0609 08:52:08.275567 10 tlsconfig.go:240] "Starting DynamicServingCertificateController" I0609 08:52:08.275664 10 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file" I0609 08:52:08.275676 10 shared_informer.go:313] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0609 08:52:08.275684 10 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" I0609 08:52:08.275908 10 shared_informer.go:313] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0609 08:52:08.375986 10 shared_informer.go:320] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0609 08:52:08.376143 10 shared_informer.go:320] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0609 08:52:08.375993 10 shared_informer.go:320] Caches are synced for RequestHeaderAuthRequestController I0609 08:52:23.547603 10 leaderelection.go:260] successfully acquired lease kube-system/cloud-controller-manager I0609 08:52:23.550525 10 event.go:389] "Event occurred" object="kube-system/cloud-controller-manager" fieldPath="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="5net-k8s-master-0_1c08edda-634c-40aa-b580-12852f2a4bc5 became leader" I0609 08:52:23.554762 10 openstack.go:504] Setting up informers for Cloud I0609 08:52:23.555992 10 controllermanager.go:319] Starting "cloud-node-lifecycle-controller" I0609 08:52:23.558714 10 controllermanager.go:338] Started "cloud-node-lifecycle-controller" I0609 08:52:23.559834 10 controllermanager.go:319] Starting "service-lb-controller" I0609 08:52:23.561899 10 openstack.go:385] Claiming to support LoadBalancer I0609 08:52:23.562019 10 controllermanager.go:338] Started "service-lb-controller" I0609 08:52:23.562063 10 controllermanager.go:319] Starting "node-route-controller" I0609 08:52:23.564807 10 node_lifecycle_controller.go:113] Sending events to api server I0609 08:52:23.565221 10 controller.go:231] Starting service controller I0609 08:52:23.565276 10 shared_informer.go:313] Waiting for caches to sync for service W0609 08:52:23.649048 10 openstack.go:488] Error initialising Routes support: router-id not set in cloud provider config W0609 08:52:23.649189 10 core.go:111] --configure-cloud-routes is set, but cloud provider does not support routes. Will not configure cloud provider routes. W0609 08:52:23.649196 10 controllermanager.go:326] Skipping "node-route-controller" I0609 08:52:23.649203 10 controllermanager.go:319] Starting "cloud-node-controller" I0609 08:52:23.651145 10 controllermanager.go:338] Started "cloud-node-controller" I0609 08:52:23.651467 10 node_controller.go:164] Sending events to api server. I0609 08:52:23.652264 10 node_controller.go:173] Waiting for informer caches to sync I0609 08:52:23.665533 10 shared_informer.go:320] Caches are synced for service

the old versions would register this on all nodes (I have one up with it...)

(kolla-2023.2) root@slurm-primary-controller:~/ansible/5Net/k8s-bootstrap# kubectl get nodes -A -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-5net-ljtcgza6zsbt-master-0 Ready master 30d v1.23.3 10.5.1.203 192.168.5.72 Fedora CoreOS 38.20230806.3.0 6.4.7-200.fc38.x86_64 docker://20.10.23 k8s-5net-ljtcgza6zsbt-node-0 Ready worker 30d v1.23.3 10.5.1.77 192.168.5.67 Fedora CoreOS 38.20230806.3.0 6.4.7-200.fc38.x86_64 docker://20.10.23 k8s-5net-ljtcgza6zsbt-node-1 Ready worker 30d v1.23.3 10.5.1.174 192.168.5.45 Fedora CoreOS 38.20230806.3.0 6.4.7-200.fc38.x86_64 docker://20.10.23 k8s-5net-ljtcgza6zsbt-node-2 Ready worker 30d v1.23.3 10.5.1.240 192.168.5.87 Fedora CoreOS 38.20230806.3.0 6.4.7-200.fc38.x86_64 docker://20.10.23

What's missing / different in the new version?

adding diagnostic info i can think of to help.

root@5net-k8s-master-0:~# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE default diagnostic-pod 2/2 Running 0 78m kube-flannel kube-flannel-ds-2mlph 1/1 Running 0 23h kube-flannel kube-flannel-ds-5v6w6 1/1 Running 0 23h kube-flannel kube-flannel-ds-crlks 1/1 Running 0 23h kube-flannel kube-flannel-ds-pg8fb 1/1 Running 1 (24h ago) 24h kube-system coredns-5cf4f94ffd-4px6h 1/1 Running 0 31m kube-system coredns-5cf4f94ffd-j9phr 1/1 Running 0 31m kube-system dnsutils 1/1 Running 1 (12m ago) 72m kube-system etcd-5net-k8s-master-0 1/1 Running 1 (24h ago) 24h kube-system kube-apiserver-5net-k8s-master-0 1/1 Running 1 (23h ago) 24h kube-system kube-controller-manager-5net-k8s-master-0 1/1 Running 1 (24h ago) 24h kube-system kube-proxy-5b279 1/1 Running 0 23h kube-system kube-proxy-5l2cc 1/1 Running 1 (24h ago) 24h kube-system kube-proxy-bdz4v 1/1 Running 0 23h kube-system kube-proxy-lfrsz 1/1 Running 0 23h kube-system kube-scheduler-5net-k8s-master-0 1/1 Running 1 (24h ago) 24h kube-system openstack-cloud-controller-manager-crnlz 1/1 Running 0 7m15s

root@5net-k8s-master-0:~# kubectl get ds openstack-cloud-controller-manager -n kube-system NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE openstack-cloud-controller-manager 1 1 1 1 1 node-role.kubernetes.io/control-plane= 23h

Please edit the object below. Lines beginning with a '#' will be ignored,

and an empty file will abort the edit. If an error occurs while saving this file will be

reopened with the relevant failures.

apiVersion: apps/v1 kind: DaemonSet metadata: annotations: deprecated.daemonset.template.generation: "4" kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"apps/v1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"k8s-app":"openstack-cloud-controller-manager"},"name":"openstack-cloud-controller-manager","namespace":"kube-system"},"spec":{"selector":{"matchLabels":{"k8s-app":"openstack-cloud-controller-manager"}},"template":{"metadata":{"labels":{"k8s-app":"openstack-cloud-controller-manager"}},"spec":{"containers":[{"args":["/bin/openstack-cloud-controller-manager","--v=1","--cluster-name=$(CLUSTER_NAME)","--cloud-config=$(CLOUD_CONFIG)","--cloud-provider=openstack","--use-service-account-credentials=false","--bind-address=127.0.0.1"],"env":[{"name":"CLOUD_CONFIG","value":"/etc/config/cloud.conf"},{"name":"CLUSTER_NAME","value":"kubernetes"}],"image":"registry.k8s.io/provider-os/openstack-cloud-controller-manager:v1.30.0","name":"openstack-cloud-controller-manager","resources":{"requests":{"cpu":"200m"}},"volumeMounts":[{"mountPath":"/etc/kubernetes/pki","name":"k8s-certs","readOnly":true},{"mountPath":"/etc/ssl/certs","name":"ca-certs","readOnly":true},{"mountPath":"/etc/config","name":"cloud-config-volume","readOnly":true}]}],"dnsPolicy":"ClusterFirstWithHostNet","hostNetwork":true,"nodeSelector":{"node-role.kubernetes.io/control-plane":""},"securityContext":{"runAsUser":1001},"serviceAccountName":"cloud-controller-manager","tolerations":[{"effect":"NoSchedule","key":"node.cloudprovider.kubernetes.io/uninitialized","value":"true"},{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"},{"effect":"NoSchedule","key":"node-role.kubernetes.io/control-plane"}],"volumes":[{"hostPath":{"path":"/etc/kubernetes/pki","type":"DirectoryOrCreate"},"name":"k8s-certs"},{"hostPath":{"path":"/etc/ssl/certs","type":"DirectoryOrCreate"},"name":"ca-certs"},{"name":"cloud-config-volume","secret":{"secretName":"cloud-config"}}]}},"updateStrategy":{"type":"RollingUpdate"}}} creationTimestamp: "2024-06-08T08:58:54Z" generation: 4 labels: k8s-app: openstack-cloud-controller-manager name: openstack-cloud-controller-manager namespace: kube-system resourceVersion: "182284" uid: 8006e824-3ea2-44b4-8a0b-335777a86009 spec: revisionHistoryLimit: 10 selector: matchLabels: k8s-app: openstack-cloud-controller-manager template: metadata: annotations: kubectl.kubernetes.io/restartedAt: "2024-06-09T08:52:05Z" creationTimestamp: null labels: k8s-app: openstack-cloud-controller-manager spec: containers: - args: - /bin/openstack-cloud-controller-manager - --v=1 - --cluster-name=$(CLUSTER_NAME) - --cloud-config=$(CLOUD_CONFIG) - --cloud-provider=openstack - --use-service-account-credentials=false - --bind-address=127.0.0.1 - --feature-gates=CloudDualStackNodeIPs=true env: - name: CLOUD_CONFIG value: /etc/config/cloud.conf - name: CLUSTER_NAME value: kubernetes image: registry.k8s.io/provider-os/openstack-cloud-controller-manager:v1.30.0 imagePullPolicy: IfNotPresent name: openstack-cloud-controller-manager resources: requests: cpu: 200m terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/kubernetes/pki name: k8s-certs readOnly: true - mountPath: /etc/ssl/certs name: ca-certs readOnly: true - mountPath: /etc/config name: cloud-config-volume readOnly: true dnsPolicy: ClusterFirstWithHostNet hostNetwork: true nodeSelector: node-role.kubernetes.io/control-plane: "" restartPolicy: Always schedulerName: default-scheduler securityContext: runAsUser: 1001 serviceAccount: cloud-controller-manager serviceAccountName: cloud-controller-manager terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: node.cloudprovider.kubernetes.io/uninitialized value: "true" - effect: NoSchedule key: node-role.kubernetes.io/master - effect: NoSchedule key: node-role.kubernetes.io/control-plane volumes: - hostPath: path: /etc/kubernetes/pki type: DirectoryOrCreate name: k8s-certs - hostPath: path: /etc/ssl/certs type: DirectoryOrCreate name: ca-certs - name: cloud-config-volume secret: defaultMode: 420 secretName: cloud-config updateStrategy: rollingUpdate: maxSurge: 0 maxUnavailable: 1 type: RollingUpdate status: currentNumberScheduled: 1 desiredNumberScheduled: 1 numberAvailable: 1 numberMisscheduled: 0 numberReady: 1 observedGeneration: 4 updatedNumberScheduled: 1

Are the tolerations the issue? It should only run on master nodes, but it should still pull info for the worker nodes external IP's ?!

nashford77 avatar Jun 09 '24 09:06 nashford77

Q: How are you meant to bootstrap the worker nodes ? is there an example? guessing kubelet args are missing ... ?

nashford77 avatar Jun 10 '24 07:06 nashford77

/kind support

kundan2707 avatar Jun 10 '24 16:06 kundan2707

not sure I fully understand the question here

are you saying the nodes you created for worker node doesn't have the external ip (which is from floating ip) ?

jichenjc avatar Jun 18 '24 07:06 jichenjc

Yes, root issue was that it was not bootstrapped on the worker node side correctly with "external" for the cloud provider key & was a cloud init issue on my side. All sorted now

On Tue, Jun 18, 2024, 3:06 AM ji chen @.***> wrote:

not sure I fully understand the question here

are you saying the nodes you created for worker node doesn't have the external ip (which is from floating ip) ?

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/cloud-provider-openstack/issues/2617#issuecomment-2175318484, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSSJU3Y7S3LDPW5CJHI6NTZH7L7TAVCNFSM6AAAAABJAWZ62KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZVGMYTQNBYGQ . You are receiving this because you authored the thread.Message ID: @.***>

nashford77 avatar Jun 18 '24 07:06 nashford77

ok, please close this if all done, thanks

jichenjc avatar Jun 18 '24 08:06 jichenjc

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 16 '24 08:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Oct 16 '24 08:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Nov 15 '24 09:11 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Nov 15 '24 09:11 k8s-ci-robot