kubernetes-nmstate icon indicating copy to clipboard operation
kubernetes-nmstate copied to clipboard

Not master node fails to start after reboot while bond or bridge is configured on primary nic

Open tsorya opened this issue 5 years ago • 10 comments

What happened:

  1. Configuring bond on primary and firstSecondary interfaces
  2. Bond successfully configured and have same ip as primary nic as expected
  3. Restarting Node2 and it fails to connect back to kubernetes master and got another hostname (localhost.localdomain) but still has configured bond with right ip and mac. Restarting master node works perfectly well.

What you expected to happen: Node2 must return and reconnect to master.

How to reproduce it (as minimally and precisely as possible): Enable eth1: apiVersion: nmstate.io/v1alpha1 kind: NodeNetworkConfigurationPolicy metadata: name: enable-eth1-policy spec: desiredState: interfaces: - name: eth1 type: ethernet state: up ipv4: dhcp: true enabled: true

Create bond: apiVersion: nmstate.io/v1alpha1 kind: NodeNetworkConfigurationPolicy metadata: name: bond0-eth0-eth1-policy spec: desiredState: interfaces: - name: eth1 type: ethernet state: up - name: bond0 type: bond state: up ipv4: dhcp: true enabled: true link-aggregation: mode: balance-rr options: miimon: '140' slaves: - eth0 - eth1

Reboot node2 : sync && sudo reboot -nf

Anything else we need to know?:

Environment:

  • NodeNetworkState on affected nodes (use kubectl get nodenetworkstate <node_name> -o yaml): apiVersion: nmstate.io/v1alpha1 kind: NodeNetworkState metadata: creationTimestamp: "2020-01-27T19:53:11Z" generation: 1 name: node02 ownerReferences:

    • apiVersion: v1 kind: Node name: node02 uid: 54ec6fda-5737-4f53-890d-d866ec8ab898 resourceVersion: "1074" selfLink: /apis/nmstate.io/v1alpha1/nodenetworkstates/node02 uid: 672eb79b-7140-4fb1-b691-93d07b13686c status: currentState: dns-resolver: config: search: [] server: [] running: search: [] server: - 192.168.66.2 - 192.168.66.2 interfaces:
      • ipv4: address:
        • ip: 192.168.66.102 prefix-length: 24 auto-dns: true auto-gateway: true auto-routes: true dhcp: true enabled: true ipv6: autoconf: false dhcp: false enabled: false link-aggregation: mode: balance-rr options: miimon: "140" slaves:
        • eth1
        • eth0 mac-address: 52:55:00:D1:55:02 mtu: 1500 name: bond0 state: up type: bond
      • bridge: options: group-forward-mask: 0 mac-ageing-time: 300 multicast-snooping: true stp: enabled: false forward-delay: 15 hello-time: 2 max-age: 20 priority: 32768 port: [] ipv4: address:
        • ip: 10.244.1.1 prefix-length: 24 dhcp: false enabled: true ipv6: address:
        • ip: fe80::2886:ebff:fed1:c07c prefix-length: 64 autoconf: false dhcp: false enabled: true mac-address: 2A:86:EB:D1:C0:7C mtu: 1450 name: cni0 state: up type: linux-bridge
      • bridge: options: group-forward-mask: 0 mac-ageing-time: 300 multicast-snooping: true stp: enabled: false forward-delay: 15 hello-time: 2 max-age: 20 priority: 32768 port: [] ipv4: address:
        • ip: 172.17.0.1 prefix-length: 16 dhcp: false enabled: true ipv6: autoconf: false dhcp: false enabled: false mac-address: 02:42:1B:DB:DD:27 mtu: 1500 name: docker0 state: up type: linux-bridge
      • ipv4: dhcp: false enabled: false ipv6: autoconf: false dhcp: false enabled: false mac-address: 52:55:00:D1:55:02 mtu: 1500 name: eth0 state: up type: ethernet
      • ipv4: dhcp: false enabled: false ipv6: autoconf: false dhcp: false enabled: false mac-address: 52:55:00:D1:55:02 mtu: 1500 name: eth1 state: up type: ethernet
      • ipv4: address:
        • ip: 192.168.66.129 prefix-length: 24 auto-dns: true auto-gateway: true auto-routes: true dhcp: true enabled: true ipv6: address:
        • ip: fe80::a027:749b:f5ac:e7a prefix-length: 64 auto-dns: true auto-gateway: true auto-routes: true autoconf: true dhcp: true enabled: true mac-address: 52:55:00:D1:56:03 mtu: 1500 name: eth2 state: up type: ethernet
      • ipv4: enabled: false ipv6: enabled: false mac-address: 6E:5C:28:99:EC:FE mtu: 1450 name: flannel.1 state: down type: vxlan vxlan: base-iface: eth0 destination-port: 8472 id: 1 remote: ""
      • ipv4: enabled: false ipv6: enabled: false mtu: 65536 name: lo state: down type: unknown route-rules: config: [] routes: config: [] running:
        • destination: 0.0.0.0/0 metric: 300 next-hop-address: 192.168.66.2 next-hop-interface: bond0 table-id: 254
        • destination: 192.168.66.0/24 metric: 300 next-hop-address: "" next-hop-interface: bond0 table-id: 254
        • destination: 10.244.1.0/24 metric: 0 next-hop-address: "" next-hop-interface: cni0 table-id: 254
        • destination: 172.17.0.0/16 metric: 0 next-hop-address: "" next-hop-interface: docker0 table-id: 254
        • destination: 0.0.0.0/0 metric: 102 next-hop-address: 192.168.66.2 next-hop-interface: eth2 table-id: 254
        • destination: 192.168.66.0/24 metric: 102 next-hop-address: "" next-hop-interface: eth2 table-id: 254
        • destination: fe80::/64 metric: 256 next-hop-address: "" next-hop-interface: cni0 table-id: 254
        • destination: fe80::/64 metric: 102 next-hop-address: "" next-hop-interface: eth2 table-id: 254
        • destination: ff00::/8 metric: 256 next-hop-address: "" next-hop-interface: cni0 table-id: 255
        • destination: ff00::/8 metric: 256 next-hop-address: "" next-hop-interface: eth2 table-id: 255
  • Problematic NodeNetworkConfigurationPolicy:

  • kubernetes-nmstate image (use kubectl get pods --all-namespaces -l app=kubernetes-nmstate -o jsonpath='{.items[0].spec.containers[0].image}'): registry:5000/nmstate/kubernetes-nmstate-handler

  • NetworkManager version (use nmcli --version) nmcli tool, version 1.20.11-23922.ee7bbddb6f.el7

  • Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

  • OS (e.g. from /etc/os-release): PRETTY_NAME="CentOS Linux 7 (Core)"

  • Others:

tsorya avatar Jan 27 '20 20:01 tsorya

@qinqon @phoracek

tsorya avatar Jan 27 '20 20:01 tsorya

Thanks @tsorya!

The new hostname appears on the host or only in kubectl get nodes?

phoracek avatar Jan 28 '20 13:01 phoracek

@phoracek on the host that's why it fails to connect to master(at least it seems so). on kubectl get nodes -> node is in state NotReady

tsorya avatar Jan 28 '20 14:01 tsorya

I wonder who configures the hostname, it could be either be given by DHCP server on set locally. Does it survive reboots without bond configured?

phoracek avatar Jan 28 '20 14:01 phoracek

Yap

tsorya avatar Jan 28 '20 14:01 tsorya

And with a bridge without bonding? I'd like to make sure it is indeed caused by the bonding, then we can dig deeper to see if it is a bug in nmstate or NetworkManager

phoracek avatar Jan 28 '20 14:01 phoracek

Didn't check. will try

tsorya avatar Jan 28 '20 15:01 tsorya

@phoracek same problem with bridge

tsorya avatar Jan 28 '20 16:01 tsorya

We have to implement this use case with copy-mac-from

qinqon avatar Jul 14 '21 10:07 qinqon

Let's adapt the default interface mac bonding test and examples to use two interfaces and do the mac cloning with https://github.com/nmstate/nmstate/blob/2599d3afa48a507d6631ce6924e3c3564dd81630/libnmstate/schema.py#L34

qinqon avatar May 31 '22 10:05 qinqon