antrea
antrea copied to clipboard
IP address and routes are configured twice on RHEL 8.4 on azure cloud after ExternalNode is created
Describe the bug
On a RHEL 8.4 VM running on azure cloud, two copies of IP addresses and routes are configured after an ExternalNode is created.
This is the configuretions before creating ExternalNode,
[root@rhel84 nsxadmin]# ip route
default via 10.110.0.1 dev eth0 proto dhcp metric 100
10.110.0.0/24 dev eth0 proto kernel scope link src 10.110.0.5 metric 100
168.63.129.16 via 10.110.0.1 dev eth0 proto dhcp metric 100
169.254.169.254 via 10.110.0.1 dev eth0 proto dhcp metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
[root@rhel84 nsxadmin]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:0d:3a:36:cd:de brd ff:ff:ff:ff:ff:ff
inet 10.110.0.5/24 brd 10.110.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:93:a5:5e:22 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether fe:b8:04:c0:b2:da brd ff:ff:ff:ff:ff:ff
After an ExternalNode is created for the VM, there two copies of ip address and routes, one is configured on eth0 which is an OVS internal port created by antrea-agent, and the other is configured on eth0~
which is renamed by antrea-agent and expected to work as the uplink.
[root@rhel84 nsxadmin]# ip route
default via 10.110.0.1 dev eth0 proto dhcp metric 100
default via 10.110.0.1 dev eth0~ proto dhcp metric 100
10.110.0.0/24 dev eth0 proto kernel scope link src 10.110.0.5
10.110.0.0/24 dev eth0 proto kernel scope link src 10.110.0.5 metric 100
10.110.0.0/24 dev eth0~ proto kernel scope link src 10.110.0.5 metric 100
168.63.129.16 via 10.110.0.1 dev eth0 proto dhcp metric 100
168.63.129.16 via 10.110.0.1 dev eth0~ proto dhcp metric 100
169.254.169.254 via 10.110.0.1 dev eth0 proto dhcp metric 100
169.254.169.254 via 10.110.0.1 dev eth0~ proto dhcp metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
[root@rhel84 nsxadmin]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0~: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
link/ether 00:0d:3a:36:cd:de brd ff:ff:ff:ff:ff:ff
inet 10.110.0.5/24 brd 10.110.0.255 scope global noprefixroute eth0~
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:93:a5:5e:22 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether fe:b8:04:c0:b2:da brd ff:ff:ff:ff:ff:ff
16: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 00:0d:3a:36:cd:de brd ff:ff:ff:ff:ff:ff
inet 10.110.0.5/24 brd 10.110.0.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::20d:3aff:fe36:cdde/64 scope link
valid_lft forever preferred_lft forever
After listing the processes, we can see that dhclient is working on the uplink (eth0~) who configures the IP and routes.
[root@rhel84 nsxadmin]# ps -ef | grep dhclient
root 458577 1044 0 09:16 ? 00:00:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /run/NetworkManager/dhclient-eth0~.pid -lf /var/lib/NetworkManager/dhclient-5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03-eth0~.lease -cf /var/lib/NetworkManager/dhclient-eth0~.conf eth0~
root 458697 453827 0 09:18 pts/0 00:00:00 grep --color=auto dhclient
This is observed only on RHEL 8.4 on azure cloud. After some comparation, we found that RHEL defaultly configure NetworkManager to use dhclient for dhcp on azure, and NetworkManager will start dhclient process on the uplink although it is renamed.
[root@rhel84 nsxadmin]# nmcli d
DEVICE TYPE STATE CONNECTION
eth0~ ethernet connected System eth0
docker0 bridge connected (externally) docker0
lo loopback unmanaged --
eth0 openvswitch unmanaged --
ovs-system openvswitch unmanaged --
[root@rhel84 nsxadmin]# nmcli con
NAME UUID TYPE DEVICE
System eth0 5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03 ethernet eth0~
docker0 e4450544-8933-4d05-a134-595f76780f6f bridge docker0
The two copies of ip routes is possibly to introduce unpredictable behaviors on the VM, e.g., an outbound traffic is possibly to leave the VM from the uplink directly, and the ANP rules are not working because the packets have bypassed the openflow entries.
To Reproduce
- Deploy K8s cluster and run antrea-controller
- Create a VM on azure with OS type RHEL 8.4
- Install vm-agent on the VM
- Create ExternalNode for the VM
- List the IP addresses and routes after the uplink and host internal interfaces are created.
Expected
After the ExternalNode is created, the IP address and routes are exposed to move to the host internal interface only, and they are not existing on the uplink before the ExternalNode resource is deleted.
Actual behavior
IP address and routes are configured both on the host internal interface and uplink.
Versions:
It is supposed to exist in all antrea releases even on the main branch.
Additional context
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days