jailmaker icon indicating copy to clipboard operation
jailmaker copied to clipboard

Slow DHCP assignment

Open Jip-Hop opened this issue 1 year ago • 13 comments

It can take about 1 minute for jails using --network-bridge to be assigned an IPv4 address via DHCP. Is this normal systemd-nspawn behavior or is this a jailmaker bug? What can we do to speed up the IPv4 assignment?

Jip-Hop avatar Feb 16 '24 09:02 Jip-Hop

Output from journalctl | grep systemd-networkd in the jail indicates it took 31 seconds.

Mar 02 19:20:05 myjail systemd-networkd[30]: host0: Configuring with /etc/systemd/network/80-container-host0.network.
Mar 02 19:20:05 myjail systemd-networkd[30]: lo: Link UP
Mar 02 19:20:05 myjail systemd-networkd[30]: lo: Gained carrier
Mar 02 19:20:05 myjail systemd-networkd[30]: Enumeration completed
Mar 02 19:20:05 myjail systemd[1]: Started systemd-networkd.service - Network Configuration.
Mar 02 19:20:05 myjail systemd[1]: Starting systemd-networkd-wait-online.service - Wait for Network to be Configured...
Mar 02 19:20:05 myjail systemd-networkd[30]: host0: Link UP
Mar 02 19:20:05 myjail systemd-networkd[30]: host0: Gained carrier
Mar 02 19:20:07 myjail systemd-networkd[30]: host0: Gained IPv6LL
Mar 02 19:20:36 myjail systemd-networkd[30]: host0: DHCPv4 address 192.168.1.246/24, gateway 192.168.1.1 acquired from 192.168.1.1

Jip-Hop avatar Mar 02 '24 18:03 Jip-Hop

Input the below in systemctl edit systemd-networkd:

[Service]
Environment=SYSTEMD_LOG_LEVEL=debug

Restart jail and check output from journalctl | grep systemd-networkd | grep host0:

Mar 02 19:31:08 myjail systemd-networkd[29]: host0: DHCPv4 client: DISCOVER
Mar 02 19:31:11 myjail systemd-networkd[29]: host0: NDISC: Sent Router Solicitation, next solicitation in 8s
Mar 02 19:31:12 myjail systemd-networkd[29]: host0: DHCPv4 client: DISCOVER
Mar 02 19:31:18 myjail systemd-networkd[29]: host0: NDISC: No RA received before link confirmation timeout
Mar 02 19:31:18 myjail systemd-networkd[29]: host0: NDISC: Invoking callback for 'timeout' event.
Mar 02 19:31:18 myjail systemd-networkd[29]: host0: NDisc handler get timeout event
Mar 02 19:31:18 myjail systemd-networkd[29]: host0: link_check_ready(): DHCPv4, DHCPv6, DHCP-PD or IPv4LL is enabled but no dynamic address is assigned yet.
Mar 02 19:31:19 myjail systemd-networkd[29]: host0: NDISC: Sent Router Solicitation, next solicitation in 16s
Mar 02 19:31:20 myjail systemd-networkd[29]: host0: DHCPv4 client: DISCOVER
Mar 02 19:31:36 myjail systemd-networkd[29]: host0: NDISC: Sent Router Solicitation, next solicitation in 32s
Mar 02 19:31:36 myjail systemd-networkd[29]: host0: DHCPv4 client: DISCOVER
Mar 02 19:31:36 myjail systemd-networkd[29]: host0: DHCPv4 client: Received message from RAW socket, processing.
Mar 02 19:31:36 myjail systemd-networkd[29]: host0: DHCPv4 client: OFFER
Mar 02 19:31:36 myjail systemd-networkd[29]: host0: DHCPv4 client: REQUEST (requesting)
Mar 02 19:31:36 myjail systemd-networkd[29]: host0: DHCPv4 client: Received message from RAW socket, processing.
Mar 02 19:31:36 myjail systemd-networkd[29]: host0: DHCPv4 client: ACK
Mar 02 19:31:36 myjail systemd-networkd[29]: host0: DHCPv4 client: lease expires in 11h 59min 59s
Mar 02 19:31:36 myjail systemd-networkd[29]: host0: DHCPv4 client: T2 expires in 10h 30min
Mar 02 19:31:36 myjail systemd-networkd[29]: host0: DHCPv4 client: T1 expires in 5h 59min 59s
Mar 02 19:31:36 myjail systemd-networkd[29]: host0: DHCPv4 address 192.168.1.246/24, gateway 192.168.1.1 acquired from 192.168.1.1

Jip-Hop avatar Mar 02 '24 18:03 Jip-Hop

Interesting, only the 3rd DISCOVER gets the OFFER response.

I'm not at my system this weekend, and I use static IPs for the services in my jail. So I have not encountered this, but I'll be more than happy to test this out.

It would be interesting to see what happens on the NIC at that time. If you can capture the network traffic with tcpdump, I'll be happy to take a look and try and figure out why this is happening.

Lockszmith-GH avatar Mar 02 '24 19:03 Lockszmith-GH

Thanks for the offer! I can indeed use some help here. 🙂 IP assignment via macvlan is instant, even when the parent interface is the bridge (br0).

systemd_nspawn_user_args=--network-macvlan=br1
        --resolv-conf=bind-host
Mar 02 20:12:05 myjail systemd-networkd[29]: mv-br1: DHCPv4 client: OFFER
Mar 02 20:12:05 myjail systemd-networkd[29]: mv-br1: DHCPv4 client: REQUEST (requesting)
Mar 02 20:12:05 myjail systemd-networkd[29]: mv-br1: DHCPv4 client: Received message from RAW socket, processing.
Mar 02 20:12:05 myjail systemd-networkd[29]: mv-br1: DHCPv4 client: ACK
Mar 02 20:12:05 myjail systemd-networkd[29]: mv-br1: DHCPv4 client: lease expires in 11h 59min 59s
Mar 02 20:12:05 myjail systemd-networkd[29]: mv-br1: DHCPv4 client: T2 expires in 10h 30min
Mar 02 20:12:05 myjail systemd-networkd[29]: mv-br1: DHCPv4 client: T1 expires in 6h
Mar 02 20:12:05 myjail systemd-networkd[29]: mv-br1: DHCPv4 address 192.168.1.149/24, gateway 192.168.1.1 acquired from 192.168.1.1

Jip-Hop avatar Mar 02 '24 19:03 Jip-Hop

64:8c:f7:89:14:56 = my router 8f:7a:f6:4c:78:49 = the jail

The config file:

systemd_nspawn_user_args=--network-bridge=br1
            --resolv-conf=bind-host

# Only used while creating the jail
distro=debian
release=bookworm

initial_setup=#!/usr/bin/bash
        set -euo pipefail
        apt-get update
        apt-get -y install tcpdump
        printf "\n%s\n%s\n" "[Link]" "Unmanaged=yes" >> /etc/systemd/network/80-container-host0.network

Create a jail with above config file jlmkr create --start --config ./config dhcpjail. The jail will have tcpdump installed and the host0 interface temporarily disabled due to Unmanaged=yes.

Then open a shell with jlmkr shell dhcpjail, undo the Unmanaged=yes setting, restart networking and start listening for dhcp traffic with tcpdump.

root@dhcpjail:~# sed -i 's/Unmanaged=yes/Unmanaged=no/g' /etc/systemd/network/80-container-host0.network; systemctl daemon-reload systemd-networkd; systemctl restart systemd-networkd; tcpdump --interface host0 port 67 or port 68 -e -n -vv
Too many arguments.
tcpdump: listening on host0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
21:16:42.174415 8f:7a:f6:4c:78:49 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 330: (tos 0xc0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 316)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 8f:7a:f6:4c:78:49, length 288, xid 0xbfbdcf2c, secs 4, Flags [none] (0x0000)
	  Client-Ethernet-Address 8f:7a:f6:4c:78:49
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message (53), length 1: Discover
	    Client-ID (61), length 19: hardware-type 255, 5d:bb:54:d7:0e:24:1b:19:90:ca:4a:7d:f4:6f:d9:2c:f4:cb
	    Parameter-Request (55), length 10: 
	      Subnet-Mask (1), Default-Gateway (3), Domain-Name-Server (6), Hostname (12)
	      Domain-Name (15), Static-Route (33), NTP (42), TZ-Name (101)
	      Unknown (120), Classless-Static-Route (121)
	    MSZ (57), length 2: 1472
	    Hostname (12), length 5: "dhcpjail"
21:16:46.290797 8f:7a:f6:4c:78:49 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 330: (tos 0xc0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 316)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 8f:7a:f6:4c:78:49, length 288, xid 0xbfbdcf2c, secs 8, Flags [none] (0x0000)
	  Client-Ethernet-Address 8f:7a:f6:4c:78:49
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message (53), length 1: Discover
	    Client-ID (61), length 19: hardware-type 255, 5d:bb:54:d7:0e:24:1b:19:90:ca:4a:7d:f4:6f:d9:2c:f4:cb
	    Parameter-Request (55), length 10: 
	      Subnet-Mask (1), Default-Gateway (3), Domain-Name-Server (6), Hostname (12)
	      Domain-Name (15), Static-Route (33), NTP (42), TZ-Name (101)
	      Unknown (120), Classless-Static-Route (121)
	    MSZ (57), length 2: 1472
	    Hostname (12), length 5: "dhcpjail"
21:16:55.047479 8f:7a:f6:4c:78:49 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 330: (tos 0xc0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 316)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 8f:7a:f6:4c:78:49, length 288, xid 0xbfbdcf2c, secs 17, Flags [none] (0x0000)
	  Client-Ethernet-Address 8f:7a:f6:4c:78:49
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message (53), length 1: Discover
	    Client-ID (61), length 19: hardware-type 255, 5d:bb:54:d7:0e:24:1b:19:90:ca:4a:7d:f4:6f:d9:2c:f4:cb
	    Parameter-Request (55), length 10: 
	      Subnet-Mask (1), Default-Gateway (3), Domain-Name-Server (6), Hostname (12)
	      Domain-Name (15), Static-Route (33), NTP (42), TZ-Name (101)
	      Unknown (120), Classless-Static-Route (121)
	    MSZ (57), length 2: 1472
	    Hostname (12), length 5: "dhcpjail"
21:17:11.780372 8f:7a:f6:4c:78:49 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 330: (tos 0xc0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 316)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 8f:7a:f6:4c:78:49, length 288, xid 0xbfbdcf2c, secs 33, Flags [none] (0x0000)
	  Client-Ethernet-Address 8f:7a:f6:4c:78:49
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message (53), length 1: Discover
	    Client-ID (61), length 19: hardware-type 255, 5d:bb:54:d7:0e:24:1b:19:90:ca:4a:7d:f4:6f:d9:2c:f4:cb
	    Parameter-Request (55), length 10: 
	      Subnet-Mask (1), Default-Gateway (3), Domain-Name-Server (6), Hostname (12)
	      Domain-Name (15), Static-Route (33), NTP (42), TZ-Name (101)
	      Unknown (120), Classless-Static-Route (121)
	    MSZ (57), length 2: 1472
	    Hostname (12), length 5: "dhcpjail"
21:17:14.786900 64:8c:f7:89:14:56 > 8f:7a:f6:4c:78:49, ethertype IPv4 (0x0800), length 354: (tos 0xc0, ttl 64, id 54699, offset 0, flags [none], proto UDP (17), length 340)
    192.168.1.1.67 > 192.168.1.171.68: [udp sum ok] BOOTP/DHCP, Reply, length 312, xid 0xbfbdcf2c, secs 33, Flags [none] (0x0000)
	  Your-IP 192.168.1.171
	  Server-IP 192.168.1.1
	  Client-Ethernet-Address 8f:7a:f6:4c:78:49
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message (53), length 1: Offer
	    Server-ID (54), length 4: 192.168.1.1
	    Lease-Time (51), length 4: 43200
	    RN (58), length 4: 21600
	    RB (59), length 4: 37800
	    Subnet-Mask (1), length 4: 255.255.255.0
	    BR (28), length 4: 192.168.1.255
	    Default-Gateway (3), length 4: 192.168.1.1
	    Domain-Name (15), length 14: "example.com"
	    Domain-Name-Server (6), length 8: 192.168.1.18,192.168.1.1
21:17:14.787267 8f:7a:f6:4c:78:49 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: (tos 0xc0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 328)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 8f:7a:f6:4c:78:49, length 300, xid 0xbfbdcf2c, secs 36, Flags [none] (0x0000)
	  Client-Ethernet-Address 8f:7a:f6:4c:78:49
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message (53), length 1: Request
	    Client-ID (61), length 19: hardware-type 255, 5d:bb:54:d7:0e:24:1b:19:90:ca:4a:7d:f4:6f:d9:2c:f4:cb
	    Parameter-Request (55), length 10: 
	      Subnet-Mask (1), Default-Gateway (3), Domain-Name-Server (6), Hostname (12)
	      Domain-Name (15), Static-Route (33), NTP (42), TZ-Name (101)
	      Unknown (120), Classless-Static-Route (121)
	    MSZ (57), length 2: 1472
	    Server-ID (54), length 4: 192.168.1.1
	    Requested-IP (50), length 4: 192.168.1.171
	    Hostname (12), length 5: "dhcpjail"
21:17:14.789413 64:8c:f7:89:14:56 > 8f:7a:f6:4c:78:49, ethertype IPv4 (0x0800), length 361: (tos 0xc0, ttl 64, id 54700, offset 0, flags [none], proto UDP (17), length 347)
    192.168.1.1.67 > 192.168.1.171.68: [udp sum ok] BOOTP/DHCP, Reply, length 319, xid 0xbfbdcf2c, secs 36, Flags [none] (0x0000)
	  Your-IP 192.168.1.171
	  Server-IP 192.168.1.1
	  Client-Ethernet-Address 8f:7a:f6:4c:78:49
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message (53), length 1: ACK
	    Server-ID (54), length 4: 192.168.1.1
	    Lease-Time (51), length 4: 43200
	    RN (58), length 4: 21600
	    RB (59), length 4: 37800
	    Subnet-Mask (1), length 4: 255.255.255.0
	    BR (28), length 4: 192.168.1.255
	    Default-Gateway (3), length 4: 192.168.1.1
	    Domain-Name (15), length 14: "example.com"
	    Hostname (12), length 5: "dhcpjail"
	    Domain-Name-Server (6), length 8: 192.168.1.18,192.168.1.1
^C
7 packets captured
7 packets received by filter
0 packets dropped by kernel

Started at 21:16:42 and done at 21:17:14. So it took 32 seconds for IP address assignment. Looks like it does a few discovers. @Lockszmith-GH anything you can tell from this logging? Should I be logging something else instead?

Jip-Hop avatar Mar 02 '24 21:03 Jip-Hop

Sorry for the late response, it's harder than looking at it in wireshark (the way to do this, would be to have tcpdump into a pcap file)

What I CAN see right now matches what your previous 'journald' output shows:

021:16:42.174415jail-MACBROADCAST BOOTP/DHCP Discover Request for hostname dhcpjail 
+4s21:16:46.290797jail-MACBROADCAST BOOTP/DHCP Discover Request for hostname dhcpjailrepeat#1
+11s21:16:55.047479jail-MACBROADCAST BOOTP/DHCP Discover Request for hostname dhcpjailrepeat#2
+16s21:17:11.780372jail-MACBROADCAST BOOTP/DHCP Discover Request for hostname dhcpjailrepeat#3
+3s21:17:14.786900router-MACjail-MACBOOTP/DHCP, Reply (assigned IP) 

I think what you need to do here is to capture from outside the jail while capturing from the inside of the jail.
I would love to know whether those first 3 DHCP Requests actually get to the physical LAN.

I was looking up your Router's MAC address to try and figure out if I can give vendor specific tips, but it doesn't register, so I'm guessing it's a Virtual Router.

Can your router capture network traffic? If so, also capture network traffic on that end.

Hope all of this points you in the right direction.

Lockszmith-GH avatar Mar 05 '24 18:03 Lockszmith-GH

Thanks for your suggestions. Sorry I replaced the original MAC addresses with random ones. Do you think you could try to reproduce this setup? A bridge interface on TrueNAS SCALE and a DHCP enabled jail using this bridge. I'd be curious to know if you have it too.

Jip-Hop avatar Mar 05 '24 20:03 Jip-Hop

I'm planning to, but didn't want to commit, because not sure I'll have the time for more than a review - if I do, I'll update you. (Just been swamped with a lot of stuff, tracking GitHub is my escapasim)

Lockszmith-GH avatar Mar 06 '24 03:03 Lockszmith-GH

No worries! Any help is appreciated :) And it's not a major issue anyway. Just a bit annoying that it's slow to start. Let me know if you ever look into it more.

Jip-Hop avatar Mar 06 '24 07:03 Jip-Hop

This happens with Truenas Scale's Apps as well, especially when using Custom Apps and adding the br0 interface. It's slow with static or dhcp addresses. I'm thinking it might be a Scale thing.

dasunsrule32 avatar May 01 '24 00:05 dasunsrule32

Too late for me now, but wanted to visit this myself as well. These are all notes to myself for tomorrow to read some more: https://superuser.com/a/1676319/100730 https://www.golinuxcloud.com/configure-network-bridge-nmcli-static-dhcp/

Slow DHCP is usually attributed to STP (need to research whether STP enabled is slowing down, or whether it's disabled and slowing down)

Lockszmith-GH avatar May 01 '24 04:05 Lockszmith-GH

Linux Bridge certainly implements STP. I'd be surprised if it wasn't enabled by default.

And there we go:

Spanning tree protocol (STP) is enabled by default

mrstux avatar May 02 '24 06:05 mrstux

This happens with Truenas Scale's Apps as well, especially when using Custom Apps and adding the br0 interface

Would be interesting to know if it is slow as well with a VM connected to a bridge interface. But good information that it doesn't just happen with jailmaker!

Jip-Hop avatar May 02 '24 06:05 Jip-Hop

Oh, fascinating to read this. I ran into the same problem, and IIRC, the same issue happened to VMs too. It's just that they take a little longer to start than systemd-nspawn containers, and it's not quite as noticeable there. (Don't quote me on that, though, it's been some months.)

Turns out you can disable STP through TrueNAS CLI, and indeed, the delay goes away. I'm not recommending this (I'm sure there's a reason for the default), but my network topology is super simple, and it worked well enough for me.

bcat avatar Jun 12 '24 03:06 bcat

Hmm I tried it. Ran midclt call interface.update br1 '{"stp": false}' and created a new jail with systemd_nspawn_user_args=--network-bridge=br1 --resolv-conf=bind-host in the config. But the delay didn't go away for me unfortunately.

Reply from the midclt call:

{
    "id": "br1",
    "name": "br1",
    "fake": false,
    "type": "BRIDGE",
    "state": {
        "name": "br1",
        "orig_name": "br1",
        "description": "br1",
        "mtu": 1500,
        "cloned": true,
        "flags": [
            "RUNNING",
            "UP",
            "BROADCAST",
            "MULTICAST",
            "LOWER_UP"
        ],
        "nd6_flags": [],
        "capabilities": [
            "tx-scatter-gather",
            "tx-checksum-ip-generic",
            "highdma",
            "tx-scatter-gather-fraglist",
            "tx-vlan-hw-insert",
            "tx-generic-segmentation",
            "rx-gro",
            "tx-tcp-segmentation",
            "tx-tcp-ecn-segmentation",
            "tx-tcp-mangleid-segmentation",
            "tx-tcp6-segmentation",
            "tx-gre-segmentation",
            "tx-gre-csum-segmentation",
            "tx-ipxip4-segmentation",
            "tx-ipxip6-segmentation",
            "tx-udp_tnl-segmentation",
            "tx-udp_tnl-csum-segmentation",
            "tx-gso-partial",
            "tx-tunnel-remcsum-segmentation",
            "tx-sctp-segmentation",
            "tx-esp-segmentation",
            "tx-udp-segmentation",
            "tx-gso-list",
            "tx-vlan-stag-hw-insert"
        ],
        "link_state": "LINK_STATE_UP",
        "media_type": "Ethernet",
        "media_subtype": "10000Mb/s Other",
        "active_media_type": "Ethernet",
        "active_media_subtype": "10000Mb/s Other",
        "supported_media": [],
        "media_options": null,
        "link_address": "xx:xx:xx:xx:xx:xx",
        "permanent_link_address": null,
        "hardware_link_address": "xx:xx:xx:xx:xx:xx",
        "aliases": [
            {
                "type": "INET",
                "address": "192.168.1.3",
                "netmask": 24,
                "broadcast": "192.168.1.255"
            },
            {
                "type": "LINK",
                "address": "xx:xx:xx:xx:xx:xx"
            }
        ],
        "vrrp_config": null,
        "rx_queues": 1,
        "tx_queues": 1
    },
    "aliases": [
        {
            "type": "INET",
            "address": "192.168.1.3",
            "netmask": 24
        }
    ],
    "ipv4_dhcp": true,
    "ipv6_auto": false,
    "description": "Main bridge interface",
    "mtu": 1500,
    "bridge_members": [
        "eno1"
    ],
    "stp": true
}

Jip-Hop avatar Jun 14 '24 08:06 Jip-Hop

Hmm, I wonder if the update didn't stick for some reason. (Still says "stp": true in the response.)

The only other thing I can think of is that I may have needed to reboot TrueNAS.... I can't remember, honestly.

Edit: I double checked Bash history and the exact command I ran was sudo midclt call interface.update br0 '{"stp": false}'. Which seems to match what you tried except for the different interface name. So I really don't know why it wouldn't work, sorry!

bcat avatar Jun 14 '24 16:06 bcat

(Still says "stp": true in the response.)

Well spotted! Haha I copied the output from the wrong command >.< And you're right! After a reboot the IP address is assigned instantly. Now it's just a matter of updating the documentation why this happens. It's not a bug :)

Jip-Hop avatar Jun 14 '24 16:06 Jip-Hop

BTW @Jip-Hop no reboot necessary, this makes the change adhoc (still need the midclt command to make the change persistent):

sudo ip link set br0 type bridge stp_state 0

Lockszmith-GH avatar Jun 22 '24 03:06 Lockszmith-GH