lxd
lxd copied to clipboard
NIC acceleration configuration method prevents the use of bonds and VF-LAG
LXD supports configuring NIC acceleration for cards that support switchdev mode and OVS hardware offload.
However, the current method of discovering which PF to allocate resources from prevents putting the PFs in a bond and making use of the VF-LAG feature: https://github.com/canonical/lxd/blob/227bc5cd75ebf2ba7b6c881f209ed5c2e6640f9b/lxd/network/network_utils_sriov.go#L385-L387
The current functionality is documented here: https://github.com/canonical/lxd/blob/227bc5cd75ebf2ba7b6c881f209ed5c2e6640f9b/doc/reference/devices_nic.md?plain=1#L209-L216
We need to replace this with some sort of configuration option that can be set per target when in a cluster.
The expected way to use this would at a high level be:
- Let's say
enp9s0f0np0
andenp9s0f1np1
are PFs of a switchdev capable card. - Create bond0 with
enp9s0f0np0
andenp9s0f1np1
- Create OVS bridge
br-bond0
that will be used both for host connectivity, external networking for LXD instances and VF resources.
The netplan configuration could be expressed as this:
network:
bonds:
bond0:
interfaces:
- enp9s0f0np0
- enp9s0f1np1
macaddress: 08:c0:eb:81:6b:78
parameters:
mode: 802.3ad
...
bridges:
br-bond0:
addresses:
- 192.0.2.10/24
interfaces:
- bond0
macaddress: 08:c0:eb:81:6b:78
openvswitch: {}
ethernets:
enp9s0f0np0:
match:
macaddress: 08:c0:eb:81:6b:78
set-name: enp9s0f0np0
virtual-function-count: 32
embedded-switch-mode: switchdev
delay-virtual-functions-rebind: true
enp9s0f1np1:
match:
macaddress: 08:c0:eb:81:6b:79
set-name: enp9s0f1np1
virtual-function-count: 32
embedded-switch-mode: switchdev
delay-virtual-functions-rebind: true
We need to replace this with some sort of configuration option that can be set per target when in a cluster.
Could you elaborate a bit on how you envisage this working? Instance NICs only every start on a single cluster member/host at a time, although they can be migrated between hosts.
I'm not quite following what is changing in the circumstance you describe?
Could you give an example of a current accelerated LXD NIC device and highlight which parts are introducing the issue?
Thanks
We need to replace this with some sort of configuration option that can be set per target when in a cluster.
Could you elaborate a bit on how you envisage this working? Instance NICs only every start on a single cluster member/host at a time, although they can be migrated between hosts.
I'm not quite following what is changing in the circumstance you describe?
Could you give an example of a current accelerated LXD NIC device and highlight which parts are introducing the issue?
This part is the part that causes the issue: https://github.com/canonical/lxd/blob/227bc5cd75ebf2ba7b6c881f209ed5c2e6640f9b/doc/reference/devices_nic.md?plain=1#L214
enp129s0f0np0
can't both be a part of bond0
and added to br-int
at the same time.
So the root of the problem is how LXD expects the user to put the PF into br-int
and uses that to identify which PF to allocate VFs from.
To examplify further, what if you wanted to use resources from both PFs? Would you put both enp129s0f0np0
and enp129s0f1np1
into the same bridge?
Incidentally, the fact that the default configuration for br-int
is to have fail_mode: secure
you may avoid a network loop, but had you done that in any other bridge it would probably not go to well.
PF selection needs to move somewhere else.
To examplify further, what if you wanted to use resources from both PFs? Would you put both
enp129s0f0np0
andenp129s0f1np1
into the same bridge?
Yes I vaguely recall that was the original thinking.
What do you think should change in LXD? Are you thinking a NIC device config setting that specifies the acceleration.parent
or something like that?
What do you think should change in LXD? Are you thinking a NIC device config setting that specifies the
acceleration.parent
or something like that?
I assume you are referring to the profile/instance configuration now, and something like acceleration.parent
makes sense.
I wonder what it should refer to though. Individual nodes of a cluster may not have the exact same physical configuration, so the parent interface name may differ from host to host.
For the bond case, I guess one could call the bond whatever, so we could mandate the operator use a uniform name.
For the non-bond case though it might be more difficult.
I see that many parts of the LXD documentation refer to machine specific commands: https://github.com/canonical/lxd/blob/227bc5cd75ebf2ba7b6c881f209ed5c2e6640f9b/doc/howto/network_ovn_setup.md?plain=1#L131
Would a way be to create some type of network that can be used to map per machine interface names to a "physical network" and then refer to that in acceleration.parent
?
Ah in that case it would likely need to be part of a member-specific config on the ovn network itself or perhaps uplink network's configuration:
https://documentation.ubuntu.com/lxd/en/latest/reference/network_ovn/#configuration-options https://documentation.ubuntu.com/lxd/en/latest/reference/network_physical/#configuration-options
This could then be used by the ovn NIC device when starting up.
@fnordahl I've marked this as blocked as we don't have access to hardware anymore to develop/test a fix for this.
If this is something you could help us with that would be appreciated.
@fnordahl as discussed in meeting, we'll get a partner cloud setup to work on this issue.