cri-o `Ping pod from the host / another pod` integration test fails on `arm64`

Which jobs are failing?

Integration tests in GitHub actions, for example: https://github.com/cri-o/cri-o/actions/runs/9967870706/job/27542531319

Which tests are failing?

218 Ping pod from the host / another pod:

https://github.com/cri-o/cri-o/blob/043cae7b7dcd44240fc0bfc76348f5ff0058a47a/test/network_ping.bats#L14-L28

Since when has it been failing?

Since July 16, 2024

Testgrid link

None

Reason for failure (if possible)

# time="2024-07-16T13:45:12Z" level=fatal msg="execing command in container synchronously: command 'ping6 -W 1 -c 2 1100:200::79' exited with 1: "

From:

https://github.com/cri-o/cri-o/blob/043cae7b7dcd44240fc0bfc76348f5ff0058a47a/test/network_ping.bats#L26

https://github.com/cri-o/cri-o/blob/043cae7b7dcd44240fc0bfc76348f5ff0058a47a/test/helpers.bash#L468

Anything else we need to know?

I assume it's related to the GitHub actions runner, which may have changed it's configuration.

Jul 17 '24 08:07 saschagrunert

Checking slack support from actuated in https://self-actuated.slack.com/archives/C043BB2NCUW/p1721206076124419

Jul 17 '24 08:07 saschagrunert

Hi @saschagrunert I missed your message (over the weekend) but have replied now.

Jul 22 '24 10:07 alexellis

So, I can reproduce the issue using the branch ping-arm64, but everything seems to be fine: IPv6 is enabled, sysctl seems to be right. I still cannot ping using IPv6, while IPv4 works as intended. No further error logs available.

Jul 31 '24 09:07 saschagrunert

@saschagrunert, do you have an ability to strace/ltrace the ping6 command? Perhaps the output would show us what is failing there. Also, any way to have access to dmesg? Or system logs? Perhaps a firewall is dropping the IPv6 connections?

Jul 31 '24 10:07 kwilczynski

Yes, here is the output from ping6 -c1 1100:200::3 (from pod 1 to pod 2): https://gist.github.com/saschagrunert/e5661341df7857bee67c11a4ddce7394

dmesg (last interesting lines):

[  +0.001280] systemd[1]: Started Journal Service.
[  +0.013014] systemd-journald[831]: Received client request to flush runtime journal.
[  +0.199841] EXT4-fs (vdb): mounted filesystem without journal. Quota mode: none.
[  +1.183351] bpfilter: Loaded bpfilter_umh pid 1813
[  +0.000475] Started bpfilter
[Jul31 10:33] Adding 1048572k swap on /swapfile.  Priority:-2 extents:2 across:1441788k
[  +0.132057] systemd-journald[831]: Received SIGTERM from PID 1 (systemd).
[  +0.000140] systemd[1]: Stopping Journal Service...
[  +0.003594] systemd[1]: systemd-journald.service: Deactivated successfully.
[  +0.000409] systemd[1]: Stopped Journal Service.
[  +0.028233] systemd[1]: Starting Journal Service...
[  +0.016669] systemd-journald[3272]: /etc/systemd/journald.conf:1: Assignment outside of section. Ignoring.
[  +0.000008] systemd-journald[3272]: /etc/systemd/journald.conf:2: Assignment outside of section. Ignoring.
[  +0.002242] systemd[1]: Started Journal Service.
[Jul31 10:35] cni0: port 1(veth6cbc2a75) entered blocking state
[  +0.000006] cni0: port 1(veth6cbc2a75) entered disabled state
[  +0.000065] device veth6cbc2a75 entered promiscuous mode
[  +0.006254] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  +0.000022] IPv6: ADDRCONF(NETDEV_CHANGE): veth6cbc2a75: link becomes ready
[  +0.000027] cni0: port 1(veth6cbc2a75) entered blocking state
[  +0.000004] cni0: port 1(veth6cbc2a75) entered forwarding state
[  +2.548643] cni0: port 2(vethd7b389cc) entered blocking state
[  +0.000008] cni0: port 2(vethd7b389cc) entered disabled state
[  +0.000071] device vethd7b389cc entered promiscuous mode
[  +0.005755] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  +0.000031] IPv6: ADDRCONF(NETDEV_CHANGE): vethd7b389cc: link becomes ready
[  +0.000031] cni0: port 2(vethd7b389cc) entered blocking state
[  +0.000004] cni0: port 2(vethd7b389cc) entered forwarding state

Jul 31 '24 10:07 saschagrunert

The containers already run privileged, so I assume it's not a permission problem.

Jul 31 '24 10:07 saschagrunert

@saschagrunert, we have this:

recvmsg(3, {msg_namelen=128}, 0)        = -1 EHOSTUNREACH (No route to host)

Do any of these work?

ping6 ip6-localhost
ping6 localhost6
ping6 0:0:0:0:0:0:0:1
ping6 ::1

This is all "localhost" for IPv6. Also, can you grab:

(not sure which tool is installed)

ifconfig -a
route -n
ip addr show
ip route show

Might need the -6 switch for some of these...

I wonder if we are missing some routes there, perhaps.

Jul 31 '24 11:07 kwilczynski

@kwilczynski yes so pinging the same address works:

bash-5.2# ping6 ip6-localhost
ping6: ip6-localhost: Name or service not known

bash-5.2# ping6 localhost6
PING localhost6(localhost (::1)) 56 data bytes
64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.042 ms
64 bytes from localhost (::1): icmp_seq=2 ttl=64 time=0.052 ms
^C
--- localhost6 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1018ms
rtt min/avg/max/mdev = 0.042/0.047/0.052/0.005 ms

bash-5.2# ping6 0:0:0:0:0:0:0:1
PING 0:0:0:0:0:0:0:1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.051 ms
64 bytes from ::1: icmp_seq=2 ttl=64 time=0.070 ms
^C
--- 0:0:0:0:0:0:0:1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1019ms
rtt min/avg/max/mdev = 0.051/0.060/0.070/0.009 ms

bash-5.2# ping6 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.033 ms
64 bytes from ::1: icmp_seq=2 ttl=64 time=0.038 ms
^C
--- ::1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1015ms
rtt min/avg/max/mdev = 0.033/0.035/0.038/0.002 ms

bash-5.2# ip -6 addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host proto kernel_lo
       valid_lft forever preferred_lft forever
3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP
    inet6 1100:200::3/24 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::38fe:91ff:fe80:32fc/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

bash-5.2# ping6 1100:200::3
PING 1100:200::3(1100:200::3) 56 data bytes
64 bytes from 1100:200::3: icmp_seq=1 ttl=64 time=0.069 ms
64 bytes from 1100:200::3: icmp_seq=2 ttl=64 time=0.034 ms
^C
--- 1100:200::3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1022ms
rtt min/avg/max/mdev = 0.034/0.051/0.069/0.017 ms

More context:

bash-5.2# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo
       valid_lft forever preferred_lft forever
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 3a:fe:91:80:32:fc brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.88.0.3/16 brd 10.88.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 1100:200::3/24 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::38fe:91ff:fe80:32fc/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

bash-5.2# ip route show
default via 10.88.0.1 dev eth0
10.88.0.0/16 dev eth0 proto kernel scope link src 10.88.0.3

bash-5.2# ip -6 route show
1100:200::/24 dev eth0 proto kernel metric 256 pref medium
1100:200::/24 via 1100:200::1 dev eth0 metric 1024 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium

I'm not sure about the routes, pinging the v6 gateway also works:

bash-5.2# ping6 1100:200::1
PING 1100:200::1(1100:200::1) 56 data bytes
64 bytes from 1100:200::1: icmp_seq=1 ttl=64 time=0.081 ms
^C
--- 1100:200::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.081/0.081/0.081/0.000 ms

Jul 31 '24 11:07 saschagrunert

A friendly reminder that this issue had no activity for 30 days.

Aug 31 '24 00:08 github-actions[bot]

A friendly reminder that this issue had no activity for 30 days.

Oct 01 '24 00:10 github-actions[bot]

A friendly reminder that this issue had no activity for 30 days.

Nov 01 '24 00:11 github-actions[bot]

Closing this issue now, since it's related to a test setup we don't have anymore.

Dec 11 '24 04:12 saschagrunert