`Ping pod from the host / another pod` integration test fails on `arm64`
Which jobs are failing?
Integration tests in GitHub actions, for example: https://github.com/cri-o/cri-o/actions/runs/9967870706/job/27542531319
Which tests are failing?
218 Ping pod from the host / another pod:
https://github.com/cri-o/cri-o/blob/043cae7b7dcd44240fc0bfc76348f5ff0058a47a/test/network_ping.bats#L14-L28
Since when has it been failing?
Since July 16, 2024
Testgrid link
None
Reason for failure (if possible)
# time="2024-07-16T13:45:12Z" level=fatal msg="execing command in container synchronously: command 'ping6 -W 1 -c 2 1100:200::79' exited with 1: "
From:
https://github.com/cri-o/cri-o/blob/043cae7b7dcd44240fc0bfc76348f5ff0058a47a/test/network_ping.bats#L26
https://github.com/cri-o/cri-o/blob/043cae7b7dcd44240fc0bfc76348f5ff0058a47a/test/helpers.bash#L468
Anything else we need to know?
I assume it's related to the GitHub actions runner, which may have changed it's configuration.
Checking slack support from actuated in https://self-actuated.slack.com/archives/C043BB2NCUW/p1721206076124419
Hi @saschagrunert I missed your message (over the weekend) but have replied now.
So, I can reproduce the issue using the branch ping-arm64, but everything seems to be fine: IPv6 is enabled, sysctl seems to be right. I still cannot ping using IPv6, while IPv4 works as intended. No further error logs available.
@saschagrunert, do you have an ability to strace/ltrace the ping6 command? Perhaps the output would show us what is failing there. Also, any way to have access to dmesg? Or system logs? Perhaps a firewall is dropping the IPv6 connections?
Yes, here is the output from ping6 -c1 1100:200::3 (from pod 1 to pod 2): https://gist.github.com/saschagrunert/e5661341df7857bee67c11a4ddce7394
dmesg (last interesting lines):
[ +0.001280] systemd[1]: Started Journal Service.
[ +0.013014] systemd-journald[831]: Received client request to flush runtime journal.
[ +0.199841] EXT4-fs (vdb): mounted filesystem without journal. Quota mode: none.
[ +1.183351] bpfilter: Loaded bpfilter_umh pid 1813
[ +0.000475] Started bpfilter
[Jul31 10:33] Adding 1048572k swap on /swapfile. Priority:-2 extents:2 across:1441788k
[ +0.132057] systemd-journald[831]: Received SIGTERM from PID 1 (systemd).
[ +0.000140] systemd[1]: Stopping Journal Service...
[ +0.003594] systemd[1]: systemd-journald.service: Deactivated successfully.
[ +0.000409] systemd[1]: Stopped Journal Service.
[ +0.028233] systemd[1]: Starting Journal Service...
[ +0.016669] systemd-journald[3272]: /etc/systemd/journald.conf:1: Assignment outside of section. Ignoring.
[ +0.000008] systemd-journald[3272]: /etc/systemd/journald.conf:2: Assignment outside of section. Ignoring.
[ +0.002242] systemd[1]: Started Journal Service.
[Jul31 10:35] cni0: port 1(veth6cbc2a75) entered blocking state
[ +0.000006] cni0: port 1(veth6cbc2a75) entered disabled state
[ +0.000065] device veth6cbc2a75 entered promiscuous mode
[ +0.006254] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ +0.000022] IPv6: ADDRCONF(NETDEV_CHANGE): veth6cbc2a75: link becomes ready
[ +0.000027] cni0: port 1(veth6cbc2a75) entered blocking state
[ +0.000004] cni0: port 1(veth6cbc2a75) entered forwarding state
[ +2.548643] cni0: port 2(vethd7b389cc) entered blocking state
[ +0.000008] cni0: port 2(vethd7b389cc) entered disabled state
[ +0.000071] device vethd7b389cc entered promiscuous mode
[ +0.005755] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ +0.000031] IPv6: ADDRCONF(NETDEV_CHANGE): vethd7b389cc: link becomes ready
[ +0.000031] cni0: port 2(vethd7b389cc) entered blocking state
[ +0.000004] cni0: port 2(vethd7b389cc) entered forwarding state
The containers already run privileged, so I assume it's not a permission problem.
@saschagrunert, we have this:
recvmsg(3, {msg_namelen=128}, 0) = -1 EHOSTUNREACH (No route to host)
Do any of these work?
-
ping6 ip6-localhost -
ping6 localhost6 -
ping6 0:0:0:0:0:0:0:1 -
ping6 ::1
This is all "localhost" for IPv6. Also, can you grab:
(not sure which tool is installed)
-
ifconfig -a -
route -n -
ip addr show -
ip route show
Might need the -6 switch for some of these...
I wonder if we are missing some routes there, perhaps.
@kwilczynski yes so pinging the same address works:
bash-5.2# ping6 ip6-localhost
ping6: ip6-localhost: Name or service not known
bash-5.2# ping6 localhost6
PING localhost6(localhost (::1)) 56 data bytes
64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.042 ms
64 bytes from localhost (::1): icmp_seq=2 ttl=64 time=0.052 ms
^C
--- localhost6 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1018ms
rtt min/avg/max/mdev = 0.042/0.047/0.052/0.005 ms
bash-5.2# ping6 0:0:0:0:0:0:0:1
PING 0:0:0:0:0:0:0:1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.051 ms
64 bytes from ::1: icmp_seq=2 ttl=64 time=0.070 ms
^C
--- 0:0:0:0:0:0:0:1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1019ms
rtt min/avg/max/mdev = 0.051/0.060/0.070/0.009 ms
bash-5.2# ping6 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.033 ms
64 bytes from ::1: icmp_seq=2 ttl=64 time=0.038 ms
^C
--- ::1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1015ms
rtt min/avg/max/mdev = 0.033/0.035/0.038/0.002 ms
bash-5.2# ip -6 addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
inet6 ::1/128 scope host proto kernel_lo
valid_lft forever preferred_lft forever
3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP
inet6 1100:200::3/24 scope global
valid_lft forever preferred_lft forever
inet6 fe80::38fe:91ff:fe80:32fc/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
bash-5.2# ping6 1100:200::3
PING 1100:200::3(1100:200::3) 56 data bytes
64 bytes from 1100:200::3: icmp_seq=1 ttl=64 time=0.069 ms
64 bytes from 1100:200::3: icmp_seq=2 ttl=64 time=0.034 ms
^C
--- 1100:200::3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1022ms
rtt min/avg/max/mdev = 0.034/0.051/0.069/0.017 ms
More context:
bash-5.2# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host proto kernel_lo
valid_lft forever preferred_lft forever
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 3a:fe:91:80:32:fc brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.88.0.3/16 brd 10.88.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 1100:200::3/24 scope global
valid_lft forever preferred_lft forever
inet6 fe80::38fe:91ff:fe80:32fc/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
bash-5.2# ip route show
default via 10.88.0.1 dev eth0
10.88.0.0/16 dev eth0 proto kernel scope link src 10.88.0.3
bash-5.2# ip -6 route show
1100:200::/24 dev eth0 proto kernel metric 256 pref medium
1100:200::/24 via 1100:200::1 dev eth0 metric 1024 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
I'm not sure about the routes, pinging the v6 gateway also works:
bash-5.2# ping6 1100:200::1
PING 1100:200::1(1100:200::1) 56 data bytes
64 bytes from 1100:200::1: icmp_seq=1 ttl=64 time=0.081 ms
^C
--- 1100:200::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.081/0.081/0.081/0.000 ms
A friendly reminder that this issue had no activity for 30 days.
A friendly reminder that this issue had no activity for 30 days.
A friendly reminder that this issue had no activity for 30 days.
Closing this issue now, since it's related to a test setup we don't have anymore.