runc icon indicating copy to clipboard operation
runc copied to clipboard

`not ok 40 checkpoint and restore with netdevice (with --debug)` on Fedora 43

Open kolyshkin opened this issue 1 month ago • 5 comments

Getting the following failure on Fedora 43:

not ok 40 checkpoint and restore with netdevice (with --debug)
# (from function `simple_cr_with_netdevice' in file tests/integration/checkpoint.bats, line 162,
#  in test file tests/integration/checkpoint.bats, line 228)
#   `simple_cr_with_netdevice --debug' failed
# runc spec (status=0)
#
# runc run -d --console-socket /tmp/bats-run-ewdfn2/runc.U3rjBK/tty/sock test_busybox_netdevice (status=0)
#
# runc state test_busybox_netdevice (status=0)
# {
#   "ociVersion": "1.3.0",
#   "id": "test_busybox_netdevice",
#   "pid": 25383,
#   "status": "running",
#   "bundle": "/tmp/bats-run-ewdfn2/runc.U3rjBK/bundle",
#   "rootfs": "/tmp/bats-run-ewdfn2/runc.U3rjBK/bundle/rootfs",
#   "created": "2025-11-12T20:45:29.528895853Z",
#   "owner": ""
# }
# runc exec test_busybox_netdevice ip address show dev dummy0 (status=0)
# 9: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1789 qdisc noqueue qlen 1000
#     link/ether 42:e1:ca:04:28:69 brd ff:ff:ff:ff:ff:ff
#     inet 169.254.169.77/32 scope global dummy0
#        valid_lft forever preferred_lft forever
#     inet6 fe80::40e1:caff:fe04:2869/64 scope link
#        valid_lft forever preferred_lft forever
# Cannot find device "dummy0"
# --- teardown ---

Seems that the reason of the failure is, dummy0 MAC address is expected to be 00:11:22:33:44:55, but in reality it is 42:e1:ca:04:28:69.

kolyshkin avatar Nov 12 '25 23:11 kolyshkin

Test case was added recently by @aojea (commit 8d180e96, PR #4538).

kolyshkin avatar Nov 12 '25 23:11 kolyshkin

I'm at Kubecon this week, is this a flake or a recurrent failures? is it failing in CI? can I get a link to the job?

aojea avatar Nov 13 '25 04:11 aojea

It's a flake that's started recently, here is a recent job that failed.

cyphar avatar Nov 13 '25 04:11 cyphar

I was taking a quick look, something that is surprising is that there are 5 testcases that call the same function simple_cr_with_netdevice

https://github.com/opencontainers/runc/blob/59a5ff14a2c1f6beb74982a9c03e31c5fb49859d/tests/integration/checkpoint.bats#L227-L229

I wonder if the debug flag is what is the trigger

https://github.com/opencontainers/runc/blob/59a5ff14a2c1f6beb74982a9c03e31c5fb49859d/tests/integration/checkpoint.bats#L141-L163

aojea avatar Nov 17 '25 10:11 aojea

interesting it seems to fail only in fedora and in the same test case not ok 40 checkpoint and restore with netdevice (with --debug)

https://github.com/opencontainers/runc/actions?query=is%3Afailure

I need to spin up a local environment with the same config to try to repro,

aojea avatar Nov 25 '25 10:11 aojea