plugins icon indicating copy to clipboard operation
plugins copied to clipboard

host-device: fix cmdDel if device was renamed

Open olivier-matz-6wind opened this issue 3 years ago • 7 comments

When a device previously passed to the container has to be moved back to the host, netlink.LinkByName() is used to find it in the container. This does not work if the device was renamed in the container.

Fix this by storing the initial interface name in the alias, in addition to the host interface name. To find the device, we now iterate the link list until we find one whose alias corresponds.

Signed-off-by: Olivier Matz [email protected]

olivier-matz-6wind avatar Aug 25 '22 13:08 olivier-matz-6wind

Hi,

A more reliable way would probably be to store the ifindex of the interface instead of storing data in the ifalias field. Indeed, like the name, the ifalias can still be modified in the container, even it it seems uncommon. The ifindex, however, never changes. I'd be happy to implement this, but I don't know how/if we can store information between cmdAdd and cmdDel. Any advice?

An alternative to ifalias is a kernel feature called "alternative name" (https://lwn.net/Articles/800927/), but it is only available in kernels >= 5.5, so it seems too recent.

One more question is about what to do when we cannot find the device. In my use-case, it blocks the destruction of the pod (when I'm using kubectl scale deployment my-pod --replicas=0) with the following error:

error killing pod: failed to "KillPodSandbox" for "90accdea-b2d9-4f1f-a9fe-8c838defd726" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"3ae636bf65be936512620499bca54789767f874d4cfd16497fc91a9fba02a6a2\": delegateDel: error invoking DelegateDel - \"host-device\": error in getting result from DelNetwork: failed to find \"net1\""

Would it make sense to ignore the error and just issue a warning? (how?) Indeed, in Linux, when a netns is destroyed, all physical devices that were present in the netns will respawn in init_net.

Thanks, Olivier

olivier-matz-6wind avatar Sep 01 '22 12:09 olivier-matz-6wind

@olivier-matz-6wind what is renaming the network device inside the container, away from the name that was specified in the CNI configuration (eg, net0 or eth0)?

eg, we'd expect that if the host-device plugin renamed the link to net0 inside the container as requested by the caller, the link would still have that name at DEL time. If it doesn't, there's not much this plugin can or should do; something reconfigured the link after the plugin in fundamental ways.

dcbw avatar Sep 19 '22 16:09 dcbw

Hi @dcbw,

what is renaming the network device inside the container, away from the name that was specified in the CNI configuration (eg, net0 or eth0)?

We are running a virtual router inside the pod. This software is able to configure the network interfaces passed by the host: add an IP address, name the interface, create virtual interfaces like vlans on top of it, ...

eg, we'd expect that if the host-device plugin renamed the link to net0 inside the container as requested by the caller, the link would still have that name at DEL time. If it doesn't, there's not much this plugin can or should do; something reconfigured the link after the plugin in fundamental ways.

Yes, I agree there are situations where the host won't be able to do anything. So maybe this pull request does not go in the right direction.

My problem is that when the host-device plugin fails to find the device (either because it was renamed, because the alias changed, or because it was moved in another netns), CmdDel() returns an error, and this error prevents the pod to be removed when I'm scaling down my deployment.

The following draft patch fixes the issue:

--- a/plugins/main/host-device/host-device.go
+++ b/plugins/main/host-device/host-device.go
@@ -257,7 +257,7 @@ func moveLinkOut(containerNs ns.NetNS, ifName string) error {
        return containerNs.Do(func(_ ns.NetNS) error {
                dev, err := netlink.LinkByName(ifName)
                if err != nil {
-                       return fmt.Errorf("failed to find %q: %v", ifName, err)
+                       return nil
                }
 
                // Devices can be renamed only when down

The question is: is this patch (or similar) acceptable? Would it make sense to simply log something in this case? In fact if the netdevice is not moved back to the host by the plugin, it will respawn in init_net when the pod netns will be deleted.

olivier-matz-6wind avatar Sep 19 '22 19:09 olivier-matz-6wind

@olivier-matz-6wind Discussed this in the maintainers meeting; a couple points:

  1. We think that whatever is doing the interface changes after CNI should also be a good citizen and un-do those changes on container teardown. CNI plugins can only go to so many heroics to account for what other things do; they need cooperation from those other things as well.
  2. We'd accept a checkpoint/state-storage system (similar to what the host-local IPAM plugin does) that stores the state on-disk. I don't think we want to depend on the interface alias mechanism, since that's not really owned by the CNI plugin and could also be used/overwritten by other things inside the container.

Essentially, for each ADD host-device would create a network-named directory in the host's /var/run/cni/host-device (eg /var/run/cni/host-device/awesome-network/) in which would go files named with the sandbox ID and original network interface name (eg, CNI_IFNAME). Inside this file could be put the original host interface name and the container ifindex.

On DEL we get the same CNI_IFNAME and sandbox ID, and we can read that file and find the original ifindex, and use that to rename the interface back into the host with the original interface name.

Most of this could be a copy of plugins/ipam/host-local/backend/ and strip out the stuff about IP addressing, etc.

Would that be something you could work on and submit a patch for?

dcbw avatar Sep 26 '22 16:09 dcbw

Hi @dcbw,

Thank you for bringing this issue in the maintainers meeting.

We think that whatever is doing the interface changes after CNI should also be a good citizen and un-do those changes on container teardown. CNI plugins can only go to so many heroics to account for what other things do; they need cooperation from those other things as well.

I agree. However, there are still situations where the containerized application is not able to revert the change that was made: bug, kill -KILL

We'd accept a checkpoint/state-storage system (similar to what the host-local IPAM plugin does) that stores the state on-disk. I don't think we want to depend on the interface alias mechanism, since that's not really owned by the CNI plugin and could also be used/overwritten by other things inside the container.

[...]

Would that be something you could work on and submit a patch for?

Sure, I can work on this.

But as we said previously, there are still corner cases where the CNI cannot do anything to restore the interface, and this would still block the downscaling of the container (and this prevents resources from being reused). Is it a problem to let the kernel respawn the netdevice in init_net when the container netns is deleted?

Thanks

olivier-matz-6wind avatar Sep 29 '22 16:09 olivier-matz-6wind

But as we said previously, there are still corner cases where the CNI cannot do anything to restore the interface, and this would still block the downscaling of the container (and this prevents resources from being reused). Is it a problem to let the kernel respawn the netdevice in init_net when the container netns is deleted?

I know the kernel will do that for "physical" devices, but I didn't think it did it for virtual devices like macvlan, veth, etc. I think we were trying to keep the plugin consistent in it's device treatment so it would handle both.

There's always going to be corner-cases with the current architecture; we're trying to plan out what "CNI 2.0" would look like and it's likely going to include some kind of reconciliation capability that would allow more comprehensive cleanup.

Also, even if the kernel did return the device to the main netns, it wouldn't have the same name it did originally, right?

dcbw avatar Oct 03 '22 15:10 dcbw

But as we said previously, there are still corner cases where the CNI cannot do anything to restore the interface, and this would still block the downscaling of the container (and this prevents resources from being reused). Is it a problem to let the kernel respawn the netdevice in init_net when the container netns is deleted?

I know the kernel will do that for "physical" devices, but I didn't think it did it for virtual devices like macvlan, veth, etc. I think we were trying to keep the plugin consistent in it's device treatment so it would handle both.

Correct, in case of virtual device, they will be deleted with the netns.

Also, even if the kernel did return the device to the main netns, it wouldn't have the same name it did originally, right?

Correct, but it is something that could be handled in the plugin. I was thinking that it was still better than blocking.

I'll try to think again to see if I can find a better solution. Else, I'll go with the /var/run/cni/ solution.

Many thanks for the feedback.

olivier-matz-6wind avatar Oct 03 '22 15:10 olivier-matz-6wind

This PR has been untouched for too long without an update. It will be closed in 7 days.

github-actions[bot] avatar Dec 03 '22 01:12 github-actions[bot]

This PR has been untouched for too long without an update. It will be closed in 7 days.

github-actions[bot] avatar Mar 18 '23 01:03 github-actions[bot]