incus icon indicating copy to clipboard operation
incus copied to clipboard

Support attaching instances to a "physical" network that's using a bridge as parent

Open mweinelt opened this issue 10 months ago • 32 comments

The host system provides a bridge interface that is connected to a physical interface.

When integrating that bridge via the physical nic type and passing the bridge interface as the parentI get the following error when starting the VM.

Failed to start device "eth-1": Failed to get PCI device info for "br-lan": open /sys/class/net/br-lan/device/uevent: no such file or directory

This is because the bridge does not have a device directory, but the code requires it for looking up PCIe information.

https://github.com/lxc/incus/blob/36701b95b6c1d83ce6d71798670988d6cf580cdf/internal/server/device/nic_physical.go#L211

The documentation claims I can pass a bridge:

The physical network type connects to an existing physical network, which can be a network interface or a bridge, and serves as an uplink network for OVN.

https://linuxcontainers.org/incus/docs/main/reference/network_physical/

This issue is reproducible on incus 6.10.1.

mweinelt avatar Mar 07 '25 14:03 mweinelt

and serves as an uplink network for OVN

You're not trying to use it as an uplink for an OVN network but directly as a physical NIC on an instance, so that's probably why this is failing. I very commonly use a "physical" network with a bridge as parent for OVN uplinks and that works just fine.

stgraber avatar Mar 07 '25 17:03 stgraber

That said, I think it'd make sense to support attaching instances to such a "physical" type network, just using the normal bridge handling logic if we see it's a bridge.

I updated the issue title accordingly.

stgraber avatar Mar 07 '25 17:03 stgraber

and serves as an uplink network for OVN

You're not trying to use it as an uplink for an OVN network but directly as a physical NIC on an instance, so that's probably why this is failing. I very commonly use a "physical" network with a bridge as parent for OVN uplinks and that works just fine.

I've been 'battling' with this very issue recently. Placing the ovn network behind a managed incus bridge essentially makes everything in the ovn network double NAT. Perhaps I don't fully understand the reasoning for why macvlan and physical networks can not be parents to the ovn bridge.
Why must I create a managed bridge with NAT enabled to be the parent of the ovn network?

snoby avatar Mar 28 '25 01:03 snoby

You don't. Most production deployments use a physical managed network as the uplink for OVN.

stgraber avatar Mar 28 '25 02:03 stgraber

That said, OVN cannot use macvlan for that, but it can use a physical network interface or a VLAN. The main restriction is that this interface or VLAN must be unconfigured on the host, so no IP config on it at all as when OVN will consume it, it will no longer be usable by the host system.

stgraber avatar Mar 28 '25 02:03 stgraber

Anyway, that's unrelated to this issue.

stgraber avatar Mar 28 '25 02:03 stgraber

You don't. Most production deployments use a physical managed network as the uplink for OVN.

I understand, and i won't add any more to this issue as I do not want to hijack the thread, however even sr-iov interfaces can not be used. Thanks for your answers and great project!

snoby avatar Mar 28 '25 02:03 snoby

Hey I am a UT student, I would like to work on this issue. I am working with a partner who will also comment on this.

oronila avatar Apr 03 '25 21:04 oronila

Assigned it to you!

stgraber avatar Apr 03 '25 21:04 stgraber

Hey, its the partner here

NathanChase22 avatar Apr 03 '25 22:04 NathanChase22

@stgraber I would like to ask you for a recommendation on how to best approach understanding this issue. And also, how could I reproduce the issue?

NathanChase22 avatar Apr 12 '25 19:04 NathanChase22

It should be quite easy to reproduce the issue, something like this:

  • sudo ip link add dev br-test type bridge
  • sudo ip link set dev br-test up
  • incus network create br-test --type=physical parent=br-test
  • incus launch images:debian/13 c1 --network br-test

The logic to be modified should all be in internal/server/device/nic_physical.go. Basically, the code needs to detect that the parent property points to a bridge, this can be checked with a call to util.PathExists(fmt.Sprintf("/sys/class/net/%s/bridge", name)). If it is a bridge, then bridge attach logic from nic_bridged.go should be followed.

stgraber avatar Apr 12 '25 23:04 stgraber

The aim would be to attach a host in a bridge created by the system. Unfortunately, this doesn't work via the CLI or the WebGUI.

If you edit the Host-YAML configuration directly, this is already possible.

devices:
  lan:
    name: lan
    nictype: bridged
    parent: br-test
    type: nic

Perhaps this information will help to narrow down the relevant position

tomy42 avatar Apr 15 '25 10:04 tomy42

The aim would be to attach a host in a bridge created by the system. Unfortunately, this doesn't work via the CLI or the WebGUI.

If you edit the Host-YAML configuration directly, this is already possible.

devices:
  lan:
    name: lan
    nictype: bridged
    parent: br-test
    type: nic

Perhaps this information will help to narrow down the relevant position

@tomy42 unfortunately I have been unable to recreate your solution by configuring the instance YAML , I still run into issues when I try editing the configuration of the instance.

Config parsing error: Invalid devices: Device validation failed for "lan": Specified network must be of type bridge

Could elaborate on the specific steps you did to get it to work?

NathanChase22 avatar Apr 30 '25 22:04 NathanChase22

I'd recommend focusing on what I mentioned in my earlier comment: https://github.com/lxc/incus/issues/1735#issuecomment-2799170436

As this shows an easy way to reproduce it without having to mess with system-wide OS configuration and also mentions exactly what files need to get modified to make this behave.

stgraber avatar Apr 30 '25 23:04 stgraber

@stgraber

We ended up getting this in our launch log, is this the expected behavior?

`Log: lxc c1 20250501010053.348 ERROR network - ../src/lxc/network.c:lxc_network_move_created_netdev_priv:3549 - Invalid argument - Failed to move network device "br-test" with ifindex 5 to network namespace 2653 and rename to physkMrBSW

lxc c1 20250501010053.348 ERROR start - ../src/lxc/start.c:lxc_spawn:1840 - Failed to create the network

lxc c1 20250501010053.353 ERROR lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING"

lxc c1 20250501010053.353 ERROR start - ../src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "c1"

lxc c1 20250501010053.353 WARN start - ../src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 17 for process 2653

lxc 20250501010053.414 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response

lxc 20250501010053.414 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"`

NathanChase22 avatar May 01 '25 01:05 NathanChase22

Yep, that's the current expected failure.

Basically Incus tries to move the entire bridge into the container rather than attach the container to the bridge.

stgraber avatar May 01 '25 01:05 stgraber

Here is our proposed approach for bridge attaching in Start() after reviewing nic_bridged.go :

  1. check parent property is bridge using util.PathExists(fmt.Sprintf("/sys/class/net/%s/bridge", name)
  2. call network.AttachInterface()
  3. add a lambda to reverter that'll will call network.DetachInterface() ~~4. Check bridge type and setup VLAN settings on port using similar logic like setupNativeBridgePortVLANs() and setupOVSBridgePortVLANs()~~ ~~5. check if hairpin mode needs to be enabled, following logic found in nic_bridged.go~~

Then in postStop() we will do something akin to nic_bridged.go and:

~~1. Check if the parent was a bridge using util.PathExists(fmt.Sprintf("/sys/class/net/%s/bridge", name)~~ ~~2. Check if host device interface still exists and host device configuration is not null~~ 3. detach interface from bridge 4. remove host interface

This is assuming a few things namely:

  1. We offering both Native and OVS support
  2. Allow for VETH pairs

EDIT: There are flaws with the approach given the nature of physical NICs which don't use VETH connections and dont manage VLAN configurations (that's the bridge's responsibility)

NathanChase22 avatar May 01 '25 02:05 NathanChase22

So far have a tentative implementation with changes only to Start() and Stop() in nic_physical.go , I was unsure whether I should modify validateEnvironment() or validateConfig().

However after building and trying to run my modified version of incus , I still run into the same error trying to create the container instance.

I am currently trying to get Debug print statements using logger.Debug() , however they aren't showing up when I rebuild with make debug and set the environment variable INCUS_DEBUG to 1 like the doc says. Is there anything more I should be doing so that my debug statements get printed out?

NathanChase22 avatar May 01 '25 18:05 NathanChase22

Try running incus monitor --pretty this will show you all the log messages coming out of Incus.

stgraber avatar May 01 '25 19:05 stgraber

Try running incus monitor --pretty this will show you all the log messages coming out of Incus.

I am trying to debug the nic_physical.go file by adding debug statements using the logger (e.g., d.logger.Debug(), but I am not seeing any output through either using incus monitor nor using the --debug flag when calling launch. How can I determine whether this is an issue with the logger not being properly initialized or if methods like Start() are not being called at all? What steps can I take to verify the execution flow and ensure my debug statements are working?

EDIT: I have found out that the development build that gets compiled doesn't reflect changes I make to the device file.

NathanChase22 avatar May 03 '25 01:05 NathanChase22

Greeting.. May I be a Contributor here? I'm not familiar with go language and incus. However, I am willing to think of it as a computer networker.

To connect L2 bridge and incus bridge, we use profile. In this time, I can't use this way.

  • incus version : 6.0.0

I tried incus monitor --pretty and incus launch images:debian/13 c1 --network br-test. I think you are waiting for it.

DEBUG  [2025-05-04T14:15:49Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0 username=hooni
DEBUG  [2025-05-04T14:15:49Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0/networks/br-test username=hooni
DEBUG  [2025-05-04T14:15:49Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0/events username=hooni
DEBUG  [2025-05-04T14:15:49Z] Event listener server handler started         id=1cf8e9fa-a313-4dc6-bbe3-bdde525ed233 local=/var/lib/incus/unix.socket remote=@
DEBUG  [2025-05-04T14:15:49Z] Handling API request                          ip=@ method=POST protocol=unix url=/1.0/instances username=hooni
DEBUG  [2025-05-04T14:15:49Z] Responding to instance create                
DEBUG  [2025-05-04T14:15:49Z] New operation                                 class=task description="Creating instance" operation=a10c227e-95af-44e3-b6b1-759cb25146cf project=default
DEBUG  [2025-05-04T14:15:49Z] Started operation                             class=task description="Creating instance" operation=a10c227e-95af-44e3-b6b1-759cb25146cf project=default
INFO   [2025-05-04T14:15:49Z] ID: a10c227e-95af-44e3-b6b1-759cb25146cf, Class: task, Description: Creating instance  CreatedAt="2025-05-04 14:15:49.288508632 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/c1] instances:[/1.0/instances/c1]]" Status=Pending StatusCode=Pending UpdatedAt="2025-05-04 14:15:49.288508632 +0000 UTC"
INFO   [2025-05-04T14:15:49Z] ID: a10c227e-95af-44e3-b6b1-759cb25146cf, Class: task, Description: Creating instance  CreatedAt="2025-05-04 14:15:49.288508632 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/c1] instances:[/1.0/instances/c1]]" Status=Running StatusCode=Running UpdatedAt="2025-05-04 14:15:49.288508632 +0000 UTC"
DEBUG  [2025-05-04T14:15:49Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0/operations/a10c227e-95af-44e3-b6b1-759cb25146cf username=hooni
DEBUG  [2025-05-04T14:15:49Z] Connecting to a remote simplestreams server   URL="https://images.linuxcontainers.org"
DEBUG  [2025-05-04T14:15:49Z] Acquiring lock for image                      fingerprint=c4f17b293ea6413a120b169de518c2a75c72c311281cc45ddabbcc8500be4c2e
DEBUG  [2025-05-04T14:15:49Z] Lock acquired for image                       fingerprint=c4f17b293ea6413a120b169de518c2a75c72c311281cc45ddabbcc8500be4c2e
DEBUG  [2025-05-04T14:15:49Z] Image already exists in the DB                fingerprint=c4f17b293ea6413a120b169de518c2a75c72c311281cc45ddabbcc8500be4c2e
DEBUG  [2025-05-04T14:15:49Z] Instance operation lock created               action=create instance=c1 project=default reusable=false
INFO   [2025-05-04T14:15:49Z] Creating instance                             ephemeral=false instance=c1 instanceType=container project=default
DEBUG  [2025-05-04T14:15:49Z] Adding device                                 device=eth0 instance=c1 instanceType=container project=default type=nic
INFO   [2025-05-04T14:15:49Z] Action: instance-created, Source: /1.0/instances/c1  location=none storage-pool=default type=container
DEBUG  [2025-05-04T14:15:49Z] Adding device                                 device=root instance=c1 instanceType=container project=default type=disk
INFO   [2025-05-04T14:15:49Z] Created instance                              ephemeral=false instance=c1 instanceType=container project=default
DEBUG  [2025-05-04T14:15:49Z] CreateInstanceFromImage started               driver=btrfs instance=c1 pool=default project=default
DEBUG  [2025-05-04T14:15:49Z] EnsureImage started                           driver=btrfs fingerprint=c4f17b293ea6413a120b169de518c2a75c72c311281cc45ddabbcc8500be4c2e pool=default
DEBUG  [2025-05-04T14:15:49Z] Setting image volume size                     driver=btrfs fingerprint=c4f17b293ea6413a120b169de518c2a75c72c311281cc45ddabbcc8500be4c2e pool=default size=
DEBUG  [2025-05-04T14:15:49Z] Checking image volume size                    driver=btrfs fingerprint=c4f17b293ea6413a120b169de518c2a75c72c311281cc45ddabbcc8500be4c2e pool=default
DEBUG  [2025-05-04T14:15:49Z] EnsureImage finished                          driver=btrfs fingerprint=c4f17b293ea6413a120b169de518c2a75c72c311281cc45ddabbcc8500be4c2e pool=default
DEBUG  [2025-05-04T14:15:49Z] Set new volume size                           driver=btrfs instance=c1 pool=default project=default size=
DEBUG  [2025-05-04T14:15:49Z] Checking volume size                          driver=btrfs instance=c1 pool=default project=default
DEBUG  [2025-05-04T14:15:49Z] CreateInstanceFromImage finished              driver=btrfs instance=c1 pool=default project=default
DEBUG  [2025-05-04T14:15:49Z] UpdateInstanceBackupFile started              driver=btrfs instance=c1 pool=default project=default
DEBUG  [2025-05-04T14:15:49Z] Instance operation lock finished              action=create err="<nil>" instance=c1 project=default reusable=false
DEBUG  [2025-05-04T14:15:49Z] UpdateInstanceBackupFile finished             driver=btrfs instance=c1 pool=default project=default
DEBUG  [2025-05-04T14:15:49Z] Start started                                 instance=c1 instanceType=container project=default stateful=false
DEBUG  [2025-05-04T14:15:49Z] Instance operation lock created               action=start instance=c1 project=default reusable=false
INFO   [2025-05-04T14:15:49Z] Starting instance                             action=start created="2025-05-04 14:15:49.362772915 +0000 UTC" ephemeral=false instance=c1 instanceType=container project=default stateful=false used="1970-01-01 00:00:00 +0000 UTC"
DEBUG  [2025-05-04T14:15:49Z] MountInstance started                         driver=btrfs instance=c1 pool=default project=default
DEBUG  [2025-05-04T14:15:49Z] MountInstance finished                        driver=btrfs instance=c1 pool=default project=default
DEBUG  [2025-05-04T14:15:49Z] Starting device                               device=eth0 instance=c1 instanceType=container project=default type=nic
DEBUG  [2025-05-04T14:15:49Z] Starting device                               device=root instance=c1 instanceType=container project=default type=disk
DEBUG  [2025-05-04T14:15:49Z] UpdateInstanceBackupFile started              driver=btrfs instance=c1 pool=default project=default
DEBUG  [2025-05-04T14:15:49Z] UpdateInstanceBackupFile finished             driver=btrfs instance=c1 pool=default project=default
DEBUG  [2025-05-04T14:15:49Z] Skipping unmount as in use                    driver=btrfs pool=default refCount=1 volName=c1
DEBUG  [2025-05-04T14:15:49Z] Handling API request                          ip=@ method=GET protocol=unix url="/internal/containers/c1/onstart?project=default" username=root
DEBUG  [2025-05-04T14:15:49Z] Scheduler: container c1 started: re-balancing 
ERROR  [2025-05-04T14:15:49Z] Failed starting instance                      action=start created="2025-05-04 14:15:49.362772915 +0000 UTC" ephemeral=false instance=c1 instanceType=container project=default stateful=false used="1970-01-01 00:00:00 +0000 UTC"
DEBUG  [2025-05-04T14:15:49Z] Start finished                                instance=c1 instanceType=container project=default stateful=false
INFO   [2025-05-04T14:15:49Z] ID: a10c227e-95af-44e3-b6b1-759cb25146cf, Class: task, Description: Creating instance  CreatedAt="2025-05-04 14:15:49.288508632 +0000 UTC" Err="Failed to run: /usr/libexec/incus/incusd forkstart c1 /var/lib/incus/containers /run/incus/c1/lxc.conf: exit status 1" Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/c1] instances:[/1.0/instances/c1]]" Status=Failure StatusCode=Failure UpdatedAt="2025-05-04 14:15:49.288508632 +0000 UTC"
DEBUG  [2025-05-04T14:15:49Z] Failure for operation                         class=task description="Creating instance" err="Failed to run: /usr/libexec/incus/incusd forkstart c1 /var/lib/incus/containers /run/incus/c1/lxc.conf: exit status 1" operation=a10c227e-95af-44e3-b6b1-759cb25146cf project=default
DEBUG  [2025-05-04T14:15:49Z] Instance operation lock finished              action=start err="Failed to run: /usr/libexec/incus/incusd forkstart c1 /var/lib/incus/containers /run/incus/c1/lxc.conf: exit status 1" instance=c1 project=default reusable=false
DEBUG  [2025-05-04T14:15:49Z] Event listener server handler stopped         listener=1cf8e9fa-a313-4dc6-bbe3-bdde525ed233 local=/var/lib/incus/unix.socket remote=@
DEBUG  [2025-05-04T14:15:49Z] Handling API request                          ip=@ method=GET protocol=unix url="/internal/containers/c1/onstopns?netns=%2Fproc%2F8485%2Ffd%2F4&project=default&target=stop" username=root
DEBUG  [2025-05-04T14:15:49Z] Instance operation lock created               action=stop instance=c1 project=default reusable=false
DEBUG  [2025-05-04T14:15:49Z] Instance initiated stop                       action=stop instance=c1 instanceType=container project=default
DEBUG  [2025-05-04T14:15:49Z] Stopping device                               device=eth0 instance=c1 instanceType=container project=default type=nic
DEBUG  [2025-05-04T14:15:50Z] Handling API request                          ip=@ method=GET protocol=unix url="/internal/containers/c1/onstop?project=default&target=stop" username=root
DEBUG  [2025-05-04T14:15:50Z] Instance operation lock inherited for stop    action=stop instance=c1 instanceType=container project=default
DEBUG  [2025-05-04T14:15:50Z] Instance stopped, cleaning up                 instance=c1 instanceType=container project=default
DEBUG  [2025-05-04T14:15:50Z] Stopping device                               device=root instance=c1 instanceType=container project=default type=disk
DEBUG  [2025-05-04T14:15:50Z] UnmountInstance started                       driver=btrfs instance=c1 pool=default project=default
DEBUG  [2025-05-04T14:15:50Z] UnmountInstance finished                      driver=btrfs instance=c1 pool=default project=default
INFO   [2025-05-04T14:15:50Z] Shut down instance                            action=stop created="2025-05-04 14:15:49.362772915 +0000 UTC" ephemeral=false instance=c1 instanceType=container project=default stateful=false used="2025-05-04 14:15:49.623328974 +0000 UTC"
DEBUG  [2025-05-04T14:15:50Z] Instance operation lock finished              action=stop err="<nil>" instance=c1 project=default reusable=false
DEBUG  [2025-05-04T14:15:50Z] Scheduler: container c1 stopped: re-balancing 
INFO   [2025-05-04T14:15:50Z] Action: instance-shutdown, Source: /1.0/instances/c1

neeks76 avatar May 04 '25 15:05 neeks76

The aim would be to attach a host in a bridge created by the system. Unfortunately, this doesn't work via the CLI or the WebGUI. If you edit the Host-YAML configuration directly, this is already possible.

devices:
  lan:
    name: lan
    nictype: bridged
    parent: br-test
    type: nic

Perhaps this information will help to narrow down the relevant position

@tomy42 unfortunately I have been unable to recreate your solution by configuring the instance YAML , I still run into issues when I try editing the configuration of the instance.

Config parsing error: Invalid devices: Device validation failed for "lan": Specified network must be of type bridge

Could elaborate on the specific steps you did to get it to work?

My System Config:

Systemd Network NetDev Config:

# 12-br-lan.netdev

[NetDev]
Name=br-lan
Kind=bridge

and

Systemd Network Network Config:

# 12-br-lan.network

[Match]
Name=br-lan

[Network]
Description="LAN Network"
Address=192.168.1.10/24
Gateway=192.168.1.1
DNS=192.168.1.1
IPv6AcceptRA=yes
IPForward=no

And the Incus Profile what is assigned to the instance

name: LAN-Intern
description: Locals LAN
devices:
  lan:
    name: lan
    nictype: bridged
    parent: br-lan
    type: nic
config: {}
project: default

tomy42 avatar May 04 '25 15:05 tomy42

Version 6.0 and 6.12 are working.

To create bridge : sudo ip link add dev br-test type bridge To interface up : sudo ip link set dev br-test up To show bridged : brctl show It need to install sudo apt install bridge-utils

Image

To create incus bridge : incus network create br-test --type=bridge

  • If the "--type" is physical, there is a spread-out error when editing the profile, so I changed to bridge.
  • Do not create a br-test to connect the L2 bridge directly.

To edit default profile : incus profile edit default Value

devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br-test
    type: nic

To create instance : incus launch images:debian/13 c1

The image below is what I have done.

Image

neeks76 avatar May 05 '25 06:05 neeks76

Try running incus monitor --pretty this will show you all the log messages coming out of Incus.

I am trying to debug the nic_physical.go file by adding debug statements using the logger (e.g., d.logger.Debug(), but I am not seeing any output through either using incus monitor nor using the --debug flag when calling launch. How can I determine whether this is an issue with the logger not being properly initialized or if methods like Start() are not being called at all? What steps can I take to verify the execution flow and ensure my debug statements are working?

EDIT: I have found out that the development build that gets compiled doesn't reflect changes I make to the device file.

Hey there,

So you ran make and got a new incusd in ~/go/bin/incusd. How are you then running that? The most common way is either to completely stop the system-wide daemon and start yours manually, or to stop the system wide daemon and replace the system binary with yours, then start it back up.

stgraber avatar May 05 '25 19:05 stgraber

So you ran make and got a new incusd in ~/go/bin/incusd. How are you then running that? The most common way is either to completely stop the system-wide daemon and start yours manually, or to stop the system wide daemon and replace the system binary with yours, then start it back up.

Right now I evoke incus admin shutdown to turn off the daemon, then I call systemctl restart incus to turn the daemon back on. Then proceed with the rest of the command. Is this the right approach?

Related to the issue itself, so far I have modified nic_physical in validateConfig(), Start() , Stop() , PostStop() by porting over code from nic_bridged and having that code be guarded by if checks making sure the parent is a bridge. I wasn't completely sure if I needed to modify the configuration checks or not in validateConfig(), since that could be considered part of "bridge attachment logic".

NathanChase22 avatar May 05 '25 19:05 NathanChase22

Right now I evoke incus admin shutdown to turn off the daemon, then I call systemctl restart incus to turn the daemon back on. Then proceed with the rest of the command. Is this the right approach?

That's fine so long as you also substitute the system binary for the one you just buit. If using the Zabbly package, it's at /opt/incus/bin/incusd, if using another distribution, it may be directly under /usr/bin or under /usr/lib/incus or something like that.

stgraber avatar May 05 '25 20:05 stgraber

Related to the issue itself, so far I have modified nic_physical in validateConfig(), Start() , Stop() , PostStop() by porting over code from nic_bridged and having that code be guarded by if checks making sure the parent is a bridge. I wasn't completely sure if I needed to modify the configuration checks or not in validateConfig(), since that could be considered part of "bridge attachment logic".

So one thing worth noting here, the only case where we should attempt bridge attachment within nic_physical is if d.network != nil, that is, we are dealing with an Incus managed network.

We do not want someone to start messing with bridges by doing incus config device add NAME eth0 nic nictype=physical parent=br0. (Which would be the case where d.network == nil).

To get back to your validateConfig question. I don't think that any change should be needed there as we're only going to hit nic_physical through the managed code path and that typically doesn't provide you with much in the way of direct configuration on the nic.

stgraber avatar May 05 '25 20:05 stgraber

So one thing worth noting here, the only case where we should attempt bridge attachment within nic_physical is if d.network != nil, that is, we are dealing with an Incus managed network.

We do not want someone to start messing with bridges by doing incus config device add NAME eth0 nic nictype=physical parent=br0. (Which would be the case where d.network == nil).

To get back to your validateConfig question. I don't think that any change should be needed there as we're only going to hit nic_physical through the managed code path and that typically doesn't provide you with much in the way of direct configuration on the nic.

Okay so don't worry about validateConfig and add an extra check for d.network != nil

Presently my start does the following things:

  1. configures a VEth pair (container) or a TAP (VM)
  2. Rebuild dnsmasq config if a managed bridge
  3. Applying host-side routes and limits
  4. Disable IPv6 on veth interface
  5. network filters
  6. Disable router advertisement/acceptance , enable port isolation
  7. Setup VLAN settings on bridge
  8. Check and Enable Hairpin mode

But because of what you've said about limiting how much direct configuration we will provide, maybe some of these steps are unnecessary. For example the documentation doesn't show any options for "security" options. So we shouldn't be setting up filters and assume that was configured before hand externally?

NathanChase22 avatar May 05 '25 21:05 NathanChase22

Right now I evoke incus admin shutdown to turn off the daemon, then I call systemctl restart incus to turn the daemon back on. Then proceed with the rest of the command. Is this the right approach?

That's fine so long as you also substitute the system binary for the one you just buit. If using the Zabbly package, it's at /opt/incus/bin/incusd, if using another distribution, it may be directly under /usr/bin or under /usr/lib/incus or something like that.

Btw this did the trick and I found out my implementation no longer reproduces the issue

NathanChase22 avatar May 05 '25 21:05 NathanChase22