ot-br-posix icon indicating copy to clipboard operation
ot-br-posix copied to clipboard

otbr + systemd-networkd results in failure of infrastructure network

Open mspang opened this issue 4 years ago • 3 comments

Describe the bug

There seems to be a serious conflict between otbr-agent & systemd-networkd (from Ubuntu 21.04 on pi 4b) that causes failure of the infrastructure network on the BR.

To Reproduce

raspberry pi 4b (Ubuntu 21.04), ot-br-posix @ b66e46416a34dc25ec17693affab006629db1385 nrf54840-dongle, ot-nrf528xx @ 063233bf4a0833bda55295bc89890a96abb5aeb8

connect to ethernet & configure via netplan

ubuntu@ubuntu:/tmp$ cat /etc/netplan/50-cloud-init.yaml 
# This file is generated from information provided by the datasource.  Changes
# to it will not persist across an instance reboot.  To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    ethernets:
        eth0:
            dhcp4: true
            optional: true
    version: 2

Prior to starting BR, network is in a healthy state:

ubuntu@ubuntu:~$ sudo networkctl list eth0
IDX LINK TYPE  OPERATIONAL SETUP
  2 eth0 ether routable    configured

1 links listed.

Start otbr-agent

sudo systemctl start otbr-agent

The backbone network fails:

ubuntu@ubuntu:~$ sudo networkctl list eth0
IDX LINK TYPE  OPERATIONAL SETUP
  2 eth0 ether routable    failed

1 links listed.

ubuntu@ubuntu:~$ sudo networkctl status eth0
● 2: eth0                       
                     Link File: /usr/lib/systemd/network/99-default.link
                  Network File: /run/systemd/network/10-netplan-eth0.network
                          Type: ether
                         State: routable (failed)
                          Path: platform-fd580000.ethernet
                        Driver: bcmgenet
                    HW Address: dc:a6:32:8c:93:bf (Raspberry Pi Trading Ltd)
                           MTU: 1500 (min: 68, max: 1500)
                         QDisc: mq
  IPv6 Address Generation Mode: eui64
          Queue Length (Tx/Rx): 5/5
              Auto negotiation: yes
                         Speed: 1Gbps
                        Duplex: full
                          Port: tp
                       Address: 10.1.0.50 (DHCP4 via 10.1.0.2)
                                fe80::5873:39b0:f14e:c1b5
                                fe80::dea6:32ff:fe8c:93bf
                       Gateway: 10.1.0.2 (Intel Corporate)
                           DNS: 10.1.0.2
             Activation Policy: up
           Required For Online: no
               DHCP4 Client ID: IAID:0x396e6af3/DUID
             DHCP6 Client DUID: DUID-EN/Vendor:0000ab11d8a291b15382e5900000

Aug 17 22:26:47 ubuntu systemd-networkd[1643]: eth0: Link UP
Aug 17 22:26:51 ubuntu systemd-networkd[1643]: eth0: Gained carrier
Aug 17 22:26:51 ubuntu systemd-networkd[1643]: eth0: DHCPv4 address 10.1.0.50/24 via 10.1.0.2
Aug 17 22:26:52 ubuntu systemd-networkd[1643]: eth0: Gained IPv6LL
Aug 17 22:30:21 ubuntu systemd-networkd[1643]: eth0: Could not set NDisc route: Gateway can not be a local address. Invalid argument
Aug 17 22:30:21 ubuntu systemd-networkd[1643]: eth0: Failed

A failed interface does not renew DHCP leases so this can cause a loss of network connectivity.

It appears to be related to RA processing in systemd (and may be a bug in systemd rather than otbr).

Expected behavior A clear and concise description of what you expected to happen.

The network should not fail.

Console/log output If applicable, add console/log output to help explain your problem.

Additional context Add any other context about the problem here.

mspang avatar Aug 17 '21 22:08 mspang

@wgtdkp

mspang avatar Aug 17 '21 22:08 mspang

This should be an RA processing bug at the systemd-networkd side since otbr-agent works well with dhcpcd. There have been similar bug reports (e.g. https://github.com/coreos/bugs/issues/1478#issuecomment-234772765)

simonlingoogle avatar Nov 10 '21 05:11 simonlingoogle

Seems systemd-networkd failed to handle RIO in the RA advertised by otbr-agent, because kernel refused to add the route with a local gateway address.

systemd-networkd should ignore RIO sent by the device itself (otbr-agent) because such route does not make sense to kernel. We actually have a dhcpcd fix for handling RIO, in which we explicitly ignored RIO sent by the device itself.

simonlingoogle avatar Nov 10 '21 05:11 simonlingoogle

Closing stale issue.

jwhui avatar Jun 27 '23 22:06 jwhui

Hi,

I have the exact same problem, how did you fix it @simonlingoogle ?

caipiblack avatar Jul 18 '23 15:07 caipiblack