antrea IP address and routes are configured twice on RHEL 8.4 on azure cloud after ExternalNode is created

Describe the bug

On a RHEL 8.4 VM running on azure cloud, two copies of IP addresses and routes are configured after an ExternalNode is created.

This is the configuretions before creating ExternalNode,

[root@rhel84 nsxadmin]# ip route
default via 10.110.0.1 dev eth0 proto dhcp metric 100 
10.110.0.0/24 dev eth0 proto kernel scope link src 10.110.0.5 metric 100 
168.63.129.16 via 10.110.0.1 dev eth0 proto dhcp metric 100 
169.254.169.254 via 10.110.0.1 dev eth0 proto dhcp metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
[root@rhel84 nsxadmin]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0d:3a:36:cd:de brd ff:ff:ff:ff:ff:ff
    inet 10.110.0.5/24 brd 10.110.0.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:93:a5:5e:22 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fe:b8:04:c0:b2:da brd ff:ff:ff:ff:ff:ff

After an ExternalNode is created for the VM, there two copies of ip address and routes, one is configured on eth0 which is an OVS internal port created by antrea-agent, and the other is configured on eth0~ which is renamed by antrea-agent and expected to work as the uplink.

[root@rhel84 nsxadmin]# ip route
default via 10.110.0.1 dev eth0 proto dhcp metric 100 
default via 10.110.0.1 dev eth0~ proto dhcp metric 100 
10.110.0.0/24 dev eth0 proto kernel scope link src 10.110.0.5 
10.110.0.0/24 dev eth0 proto kernel scope link src 10.110.0.5 metric 100 
10.110.0.0/24 dev eth0~ proto kernel scope link src 10.110.0.5 metric 100 
168.63.129.16 via 10.110.0.1 dev eth0 proto dhcp metric 100 
168.63.129.16 via 10.110.0.1 dev eth0~ proto dhcp metric 100 
169.254.169.254 via 10.110.0.1 dev eth0 proto dhcp metric 100 
169.254.169.254 via 10.110.0.1 dev eth0~ proto dhcp metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
[root@rhel84 nsxadmin]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0~: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether 00:0d:3a:36:cd:de brd ff:ff:ff:ff:ff:ff
    inet 10.110.0.5/24 brd 10.110.0.255 scope global noprefixroute eth0~
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:93:a5:5e:22 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fe:b8:04:c0:b2:da brd ff:ff:ff:ff:ff:ff
16: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 00:0d:3a:36:cd:de brd ff:ff:ff:ff:ff:ff
    inet 10.110.0.5/24 brd 10.110.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::20d:3aff:fe36:cdde/64 scope link 
       valid_lft forever preferred_lft forever

After listing the processes, we can see that dhclient is working on the uplink (eth0~) who configures the IP and routes.

[root@rhel84 nsxadmin]# ps -ef | grep dhclient
root      458577    1044  0 09:16 ?        00:00:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /run/NetworkManager/dhclient-eth0~.pid -lf /var/lib/NetworkManager/dhclient-5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03-eth0~.lease -cf /var/lib/NetworkManager/dhclient-eth0~.conf eth0~
root      458697  453827  0 09:18 pts/0    00:00:00 grep --color=auto dhclient

This is observed only on RHEL 8.4 on azure cloud. After some comparation, we found that RHEL defaultly configure NetworkManager to use dhclient for dhcp on azure, and NetworkManager will start dhclient process on the uplink although it is renamed.

[root@rhel84 nsxadmin]# nmcli d
DEVICE      TYPE         STATE                   CONNECTION  
eth0~       ethernet     connected               System eth0 
docker0     bridge       connected (externally)  docker0     
lo          loopback     unmanaged               --          
eth0        openvswitch  unmanaged               --          
ovs-system  openvswitch  unmanaged               --          
[root@rhel84 nsxadmin]# nmcli con
NAME         UUID                                  TYPE      DEVICE  
System eth0  5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03  ethernet  eth0~   
docker0      e4450544-8933-4d05-a134-595f76780f6f  bridge    docker0

The two copies of ip routes is possibly to introduce unpredictable behaviors on the VM, e.g., an outbound traffic is possibly to leave the VM from the uplink directly, and the ANP rules are not working because the packets have bypassed the openflow entries.

To Reproduce

Deploy K8s cluster and run antrea-controller
Create a VM on azure with OS type RHEL 8.4
Install vm-agent on the VM
Create ExternalNode for the VM
List the IP addresses and routes after the uplink and host internal interfaces are created.

Expected

After the ExternalNode is created, the IP address and routes are exposed to move to the host internal interface only, and they are not existing on the uplink before the ExternalNode resource is deleted.

Actual behavior

IP address and routes are configured both on the host internal interface and uplink.

Versions:

It is supposed to exist in all antrea releases even on the main branch.

Additional context

Jul 03 '23 09:07 wenyingd

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

Oct 02 '23 00:10 github-actions[bot]

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

Feb 12 '24 00:02 github-actions[bot]

antrea antrea copied to clipboard

IP address and routes are configured twice on RHEL 8.4 on azure cloud after ExternalNode is created

antrea
antrea copied to clipboard