cloud-provider-vsphere icon indicating copy to clipboard operation
cloud-provider-vsphere copied to clipboard

Node initialization fails with multiple IPs and node IP is not first

Open bnason opened this issue 2 years ago • 4 comments

What happened?

Deployed vSphere CPI into a new cluster, when it goes to initialize the node, it is seemingly failing because it only gets the first IP in vSphere and that does not match the reported node IP. This node has 2 IPs associated with it initially: 1) the K8s API Server Floating IP and 2) the node ip

What did you expect to happen?

vSphere CPI to check all IPs against the node ip

How can we reproduce it (as minimally and precisely as possible)?

Have a VM Node with 2 IPs, where the first IP is not the "node ip". For example, it could be a floating load balancer ip.

Anything else we need to know (please consider providing level 4 or above logs of CPI)?

vsphere-tmm-vsphere-cpi-rw497.log

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:58:47Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.8", GitCommit:"7061dbbf75f9f82e8ab21f9be7e8ffcaae8e0d44", GitTreeState:"clean", BuildDate:"2022-03-16T14:04:34Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration

OS version

$ cat /etc/os-release
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
LOGO=archlinux-logo

Kernel (e.g. uname -a)

$ uname -a
Linux workstation 5.16.13-arch1-1 #1 SMP PREEMPT Tue, 08 Mar 2022 20:07:36 +0000 x86_64 GNU/Linux

Install tools

Cluster API (CAPI) Cluster API vSphere Provider (CAPV) Cluster API Bootstrap Provider Talos (CABPT) Cluster API Control Plane Provider Talos (CACPPT)

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

vSphere CPI v1.22.5 vSphere CSI v2.5.1

Others

bnason avatar Apr 08 '22 15:04 bnason

I've forked this repo and updated the code as best as I can to allow for multiple IPs and have it working for me.

Fix applied to v1.22.6 https://github.com/bnason/cloud-provider-vsphere/tree/v1.22.6-fix-multi-ips

Fix applied to master https://github.com/bnason/cloud-provider-vsphere/tree/fix-multi-ips

bnason avatar Apr 12 '22 18:04 bnason

Did you create a PR for your fixes?

johnwc avatar May 15 '22 18:05 johnwc

Sorry. I'll take a look at the code. Currently, workaround would be using below configuration to exclude the floating IP

[Nodes]
exclude-internal-network-subnet-cidr
exclude-external-network-subnet-cidr

lubronzhan avatar Jul 19 '22 22:07 lubronzhan

Are you passing --node-ip to your kubelet? https://github.com/kubernetes/cloud-provider/blob/v0.22.6/controllers/node/node_controller.go#L695-L708 Looks like if the IP that vsphere CPI pick up doesn't match the kubelet ip then it will fail. Let me check with previous contributor why only single ip is added to Node, whether there are other potential risks

lubronzhan avatar Sep 30 '22 18:09 lubronzhan

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 29 '22 18:12 k8s-triage-robot

/remove-lifecycle stale

lubronzhan avatar Jan 20 '23 00:01 lubronzhan

In your case do you advertise floating IP on the same NIC as the original IP? Otherwise it's by default only show single IP per NIC. I'm trying to find the reason behind it. https://kubernetes.slack.com/archives/C718BPBQ8/p1674176893656109

lubronzhan avatar Jan 20 '23 01:01 lubronzhan

Having the same issue, the control plane node having the Kubernetes API Server VIP is not initialized by vSphere CPI :

instances.NodeAddressesByProviderID() FOUND with 420f3e49-ce2a-1623-6446-593cb5ed1354
E0202 15:22:59.147699       1 node_controller.go:229] error syncing 'k8s-control-plane-89v1pq': failed to get node modifiers from cloud provider: provided node ip for node "k8s-control-plane-89v1pq" is not valid: failed to get node address from cloud provider that matches ip: 10.50.1.4, requeuing

I tried to exclude the vip in vsphere.conf, but not working :

apiVersion: v1
kind: ConfigMap
metadata:
  name: cloud-config
  namespace: kube-system
data:
  vsphere.conf: |
    global:
      port: 443
      ....
    nodes:
      exclude-internal-network-subnet-cidr: "10.50.1.200/32"
      exclude-external-network-subnet-cidr: "10.50.1.200/32"

Edit : using vSphere CPI 1.25 on Kubernetes 1.25.6.

benjvfr avatar Feb 02 '23 15:02 benjvfr

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 03 '23 16:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jun 02 '23 16:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jul 02 '23 16:07 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jul 02 '23 16:07 k8s-ci-robot