crun icon indicating copy to clipboard operation
crun copied to clipboard

crun should never checkpoint the netns

Open Luap99 opened this issue 1 year ago • 5 comments

Basically the same as 2a0947e601c354f0c63a3802f8084db1b23e1851 but this made the exception to checkpoint the netns when the netns path is empty in the runtime spec. This works only for the case where podman creates a netns in advance but this is not always the case, i.e. when a custom userns is used (which also doesn't work in crun right now but this is a different issue #1207).

The problem now is that I want to consolidate the network setup code in podman https://github.com/containers/podman/pull/18468 to only use one setup path instead of two for with and without userns. So going forward I always want to let the runtime create the netns (empty netns path in config) and after the create call configure the netns in podman. This works just fine except for the checkpoint/restore case. On restore criu tries to restore the netns which fails:

(00.048662)      1: Try to restore a link 10:2:eth0
(00.048676)      1: Restoring link eth0 type 2
(00.048691)      1: Restoring netdev eth0 idx 2
(00.048705)      1: Restore ll addr (8e:../6) for device
(00.048712)      1: Error (criu/net.c:1462): Unknown peer net namespace
(00.063166)      1: Error (criu/libnetlink.c:54): -16 reported by netlink: Device or resource busy
(00.063205)      1: Error (criu/net.c:1816): Can't restore link: -16
(00.063270)      1: Error (criu/util.c:1411): Can't wait or bad status: errno=0, status=65280
(00.064717) Error (criu/cr-restore.c:2536): Restoring FAILED.

The same commands work just fine with runc because they always ignore the netns. https://github.com/opencontainers/runc/pull/1840/commits/8187fb740c202f5d29f1717bb933143c10dae8a1

So my ask is to always ignore the netns to match runc behavior and allow poman to work correctly.

Luap99 avatar May 09 '23 11:05 Luap99

I had a look at it and can provide a fix. Different then in runc, but similar.

adrianreber avatar May 10 '23 08:05 adrianreber

This first needs changes in libcriu. The needed interface has not been exported, yet. I will open a PR in CRIU first.

adrianreber avatar May 11 '23 15:05 adrianreber

See https://github.com/checkpoint-restore/criu/pull/2175 for the CRIU changes.

adrianreber avatar May 11 '23 16:05 adrianreber

@adrianreber Is there a way we can move forward here or do we need to wait on a new criu release?

Luap99 avatar Jun 21 '23 12:06 Luap99

I see CI runs based on Ubuntu. I think we can update CRIU in Fedora and the Ubuntu PPA to include the necessary patch in the current release without waiting for a new release.

@rst0git Can you update the CRIU PPA to 3.18 with the patch from https://github.com/checkpoint-restore/criu/pull/2175 (maybe also the Sapphire Rapids patch)?

adrianreber avatar Jun 23 '23 10:06 adrianreber