k0s icon indicating copy to clipboard operation
k0s copied to clipboard

Bridge Networking is not working.

Open NileshChandekar opened this issue 1 month ago • 2 comments

Before creating an issue, make sure you've checked the following:

  • [x] You are running the latest released version of k0s
  • [x] Make sure you've searched for existing issues, both open and closed
  • [x] Make sure you've searched for PRs too, a fix might've been merged already
  • [x] You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

[root@worker13-moonlightdepl1 ~]# uname -srvmo; cat /etc/os-release || lsb_release -a
Linux 5.15.0-304.171.4.1.el9uek.x86_64 #2 SMP Thu Jan 16 12:12:24 PST 2025 x86_64 GNU/Linux
NAME="Oracle Linux Server"
VERSION="9.6"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="9.6"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Oracle Linux Server 9.6"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:9:6:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://github.com/oracle/oracle-linux"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 9"
ORACLE_BUGZILLA_PRODUCT_VERSION=9.6
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=9.6
[root@worker13-moonlightdepl1 ~]#

Version

k0s version v1.33.4+k0s.0

Sysinfo

`k0s sysinfo`
[root@worker13-moonlightdepl1 ~]# k0s sysinfo
Total memory: 3.3 GiB (pass)
File system of /var/lib/k0s: xfs (pass)
Disk space available for /var/lib/k0s: 5.0 GiB (pass)
Relative disk space available for /var/lib/k0s: 46% (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
Linux kernel release: 5.15.0-304.171.4.1.el9uek.x86_64 (pass)
Max. file descriptors per process: current: 524287 / max: 524288 (pass)
AppArmor: unavailable (pass)
Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
Executable in PATH: mount: /usr/bin/mount (pass)
Executable in PATH: umount: /usr/bin/umount (pass)
/proc file system: mounted (0x9fa0) (pass)
Control Groups: version 2 (pass)
cgroup controller "cpu": available (is a listed root controller) (pass)
cgroup controller "cpuacct": available (via cpu in version 2) (pass)
cgroup controller "cpuset": available (is a listed root controller) (pass)
cgroup controller "memory": available (is a listed root controller) (pass)
cgroup controller "devices": available (device filters attachable) (pass)
cgroup controller "freezer": available (cgroup.freeze exists) (pass)
cgroup controller "pids": available (is a listed root controller) (pass)
cgroup controller "hugetlb": available (is a listed root controller) (pass)
cgroup controller "blkio": available (via io in version 2) (pass)
CONFIG_CGROUPS: Control Group support: built-in (pass)
CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
CONFIG_NAMESPACES: Namespaces support: built-in (pass)
CONFIG_UTS_NS: UTS namespace: built-in (pass)
CONFIG_IPC_NS: IPC namespace: built-in (pass)
CONFIG_PID_NS: PID namespace: built-in (pass)
CONFIG_NET_NS: Network namespace: built-in (pass)
CONFIG_NET: Networking support: built-in (pass)
CONFIG_INET: TCP/IP networking: built-in (pass)
CONFIG_IPV6: The IPv6 protocol: built-in (pass)
CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass)
CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
CONFIG_NETFILTER_NETLINK: built-in (pass)
CONFIG_NF_NAT: module (pass)
CONFIG_IP_SET: IP set support: module (pass)
CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
CONFIG_IP_VS: IP virtual server support: module (pass)
CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
CONFIG_NF_CONNTRACK_IPV4: IPv4 connection tracking support (required for NAT): unknown (warning)
CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
CONFIG_NF_DEFRAG_IPV4: module (pass)
CONFIG_NF_CONNTRACK_IPV6: IPv6 connection tracking support (required for NAT): unknown (warning)
CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
CONFIG_NF_DEFRAG_IPV6: module (pass)
CONFIG_BRIDGE: 802.1d Ethernet Bridging: built-in (pass)
CONFIG_LLC: built-in (pass)
CONFIG_STP: built-in (pass)
CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
CONFIG_PROC_FS: /proc file system support: built-in (pass)
[root@worker13-moonlightdepl1 ~]#

What happened?

  • We deployed k0s version 1.33.4 on OEL-9.5 CSI Hardened
  • Total nodes 51
  • 1 Deployer
  • 3 Master
  • 47 Worker - of which 5 nodes are rook-ceph+worker (hybrid)

++++++++++ MAIN ISSUE: ++++++++++

  • When enabling metal-lb sometimes we are able to reach to endpoint and someitmes not.
  • Previously while we were setting up DEV env, everything was working with same manifest,
  • This time we just use bridge networking , other than no change. and the issue observed.
192.168.122.7 | CHANGED | rc=0 >>
NAME                 UUID                                  TYPE      DEVICE      
+ br0                  130ac180-fe3c-4973-94d8-699cb8a03977  bridge    br0         
enp2s0               09f96c23-20bd-3f1f-a005-18698484f1ba  ethernet  enp2s0      
+ bridge-slave-enp1s0  99749e4c-70e0-4b05-a9c6-33a2148a863c  ethernet  enp1s0      
kube-bridge          2c051724-849b-48ec-9c7b-d02065e5fcb5  bridge    kube-bridge 
lo                   74e26458-3fee-48da-a80b-bea32c2fb3db  loopback  lo
  • We reverted the change, we removed the brdige setup and configured static ip direclty on interface things works smoothly.

  • This looks an serious issue.

Steps to reproduce

++++++++++ MAIN ISSUE: ++++++++++

  • When enabling metal-lb sometimes we are able to reach to endpoint and someitmes not.
  • Previously while we were setting up DEV env, everything was working with same manifest,
  • This time we just use bridge networking , other than no change. and the issue observed.
  • What we noticed is, coredns, konnectiivty and kubeproxy pods were not starting
  • once we changed everything started working.
192.168.122.7 | CHANGED | rc=0 >>
NAME                 UUID                                  TYPE      DEVICE      
+ br0                  130ac180-fe3c-4973-94d8-699cb8a03977  bridge    br0         
enp2s0               09f96c23-20bd-3f1f-a005-18698484f1ba  ethernet  enp2s0      
+ bridge-slave-enp1s0  99749e4c-70e0-4b05-a9c6-33a2148a863c  ethernet  enp1s0      
kube-bridge          2c051724-849b-48ec-9c7b-d02065e5fcb5  bridge    kube-bridge 
lo                   74e26458-3fee-48da-a80b-bea32c2fb3db  loopback  lo
  • We reverted the change, we removed the brdige setup and configured static ip direclty on interface things works smoothly.

  • This looks an serious issue. 1.

sudo nmcli connection add type bridge ifname br0 con-name br0
 
sudo nmcli connection modify br0 ipv4.addresses '192.168.122.59/24'
sudo nmcli connection modify br0 ipv4.gateway '192.168.122.1'
sudo nmcli connection modify br0 ipv4.dns '1.1.1.1'
sudo nmcli connection modify br0 ipv4.method manual
 
sudo nmcli connection add type bridge-slave ifname enp1s0 master br0
sudo nmcli connection up br0
reboot

Expected behavior

  • With bridge networking as well, k0s cluster should work smoothly.

Actual behavior

  • With bridge networking as well, k0s cluster should work smoothly.
  • arp is OK,
  • No issue in networking,
  • I am not sure, k0s does supports bridge mode networks.
  • its just an another interface so it should work, for us it did not work.

Screenshots and logs

No response

Additional context

No response

NileshChandekar avatar Dec 06 '25 06:12 NileshChandekar

What we noticed is, coredns, konnectiivty and kubeproxy pods were not starting

Any logs from those pods would be helpful. In general those things, being networking related components, having issues tells me that there's something those cannot determine on the network. Most of the kube components determine the interfaces and addresses they use based on default routes. In this case maybe the bridge network makes them confused and they cannot function properly.

When enabling metal-lb sometimes we are able to reach to endpoint and someitmes not.

So is metal-lb, and it's services the only thing that gets broken? For metal-lb, you probably need to tell it via which IF to broadcast the LB svc IPs, something like:

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: br0-adv
  namespace: metallb-system
spec:
  ipAddressPools:
    - my-pool
  interfaces:
    - br0

I don't have any env with bridge networking at my hands, so somewhat guessing here. Do you want/need the node to have the bridge IP as node address? If so, you probably need to tell kubelet that with --node-ip=1.2.3.4.

jnummelin avatar Dec 08 '25 20:12 jnummelin

What we noticed is, coredns, konnectiivty and kubeproxy pods were not starting

Any logs from those pods would be helpful. In general those things, being networking related components, having issues tells me that there's something those cannot determine on the network. Most of the kube components determine the interfaces and addresses they use based on default routes. In this case maybe the bridge network makes them confused and they cannot function properly.

I can see the route clearly, ip r was showing the correct route.

When enabling metal-lb sometimes we are able to reach to endpoint and someitmes not.

So is metal-lb, and it's services the only thing that gets broken? For metal-lb, you probably need to tell it via which IF to broadcast the LB svc IPs, something like:

Its not only metal-lb but the defualt kube-systems pod as well got affected.

  1. I too deleted the env.
  2. I will setup the env and will share the logs again,

NileshChandekar avatar Dec 10 '25 06:12 NileshChandekar