libnetwork-plugin
libnetwork-plugin copied to clipboard
[libnetwork] Calico does not work properly on systems with kernel version 4.x+ unless ipv6 network is disabled
When I run:
docker run --privileged -tid --rm --network net2 --name k530-net2 harbor.hpc.com/images/busybox
docker reported a problem:
15ba23b49172c9dc4f0643f3f11984ce02c878a60bafccb268becec600330a8f
docker: Error response from daemon: OCI runtime create failed: container_linux.go:348:
starting container process caused "process_linux.go:402: container init caused
\"process_linux.go:385: running prestart hook 0 caused \\\"error running hook: exit status 1,
stdout: , stderr: time=\\\\\\\"2018-09-16T22:25:13-04:00\\\\\\\" level=fatal msg=
\\\\\\\"failed to add interface temp31556e7d316 to sandbox: error setting interface
\\\\\\\\\\\\\\\"temp31556e7d316\\\\\\\\\\\\\\\" routes to [\\\\\\\\\\\\\\\"169.254.1.1/32\\\\\\\\\\\\\\\"
\\\\\\\\\\\\\\\"fe80::b448:31ff:fee4:de7d/128\\\\\\\\\\\\\\\"]: permission denied\\\\\\\"\\\\n\\\"\"": unknown.
I can run this command on standard CentOS 7.x with kernel 3.x and it also not work on ubuntu 18.04 which has kernel 4.x, I found some log in dmesg
:
[ 2111.674564] IPv6: ADDRCONF(NETDEV_UP): temp66aa9bddf71: link is not ready
[ 2111.674700] IPv6: ADDRCONF(NETDEV_UP): cali66aa9bddf71: link is not ready
[ 2111.674710] IPv6: ADDRCONF(NETDEV_CHANGE): cali66aa9bddf71: link becomes ready
[ 2111.674760] IPv6: ADDRCONF(NETDEV_CHANGE): temp66aa9bddf71: link becomes ready
[ 2111.926941] cali0: renamed from temp66aa9bddf71
[ 2113.110629] IPv6: ADDRCONF(NETDEV_UP): tempf1169b462ad: link is not ready
[ 2113.111066] IPv6: ADDRCONF(NETDEV_CHANGE): tempf1169b462ad: link becomes ready
[ 2113.325654] cali0: renamed from tempf1169b462ad
[ 2114.395699] IPv6: ADDRCONF(NETDEV_UP): tempc99fe2a39dc: link is not ready
[ 2114.400374] IPv6: ADDRCONF(NETDEV_CHANGE): tempc99fe2a39dc: link becomes ready
[ 2114.571455] cali0: renamed from tempc99fe2a39dc
[ 2115.557923] IPv6: ADDRCONF(NETDEV_UP): tempa2528b66f07: link is not ready
[ 2115.563399] IPv6: ADDRCONF(NETDEV_CHANGE): tempa2528b66f07: link becomes ready
[ 2115.744184] cali0: renamed from tempa2528b66f07
So I try to disable ipv6 with command:
echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
Then it works fine
Expected Behavior
I hope Calico 2.6 can work properly on systems with kernel version 4.x without ipv6 disabled.
Possible Solution
Disable ipv6
echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
Steps to Reproduce (for bugs)
- Install Calico 2.6 on the systems with kernel 4.x+
- Try to create a container with calico network
Context
Your Environment
- Calicoctl version v1.6.4, build ae98f46f
- Docker without orchestration
- Operating System and version: CentOS Linux release 7.5.1804 (Core) Kernel: Linux 4.18.7
Hi, thanks for the report! We're experiencing the exact same issue, but the behaviour seems very flakey. Given many retries/re-schedules, chances are most containers will be successfully started eventually. This started appearing for us when we went from 4.15.15 to 4.16.x (and now 4.18.10).
We are using libnetwork, which is likely the case for OP as well. Calico tries to set an IPv6 address on a container interface that should not be v6-enabled. Logging into the container does not show a (stateless) link-local fe80 or anything auto-assigned by the kernel. The Docker network's EnableIPv6
is set to False
, and none of the containers we run have anything set in their IPv6Address fields.
# docker network inspect <net>
...
"IPv4Address": "10.123.121.83/32",
"IPv6Address": ""
Could this be a fallback mechanism in case IPv6Address is empty? Newer kernel versions likely reject unwanted addresses instead of silently dropping the Netlink messages, or are rejected using a different errno.
@caseydavenport @fasaxc Any ideas?
@ti-mo
We're experiencing the exact same issue, but the behaviour seems very flakey. Given many retries/re-schedules, chances are most containers will be successfully started eventually.
Yes, container will be successfully started after many retries, but the network cannot communicate even if the container is already started.
The same behavior with the command: docker network connect
, the network cannot communicate even if the IP has allocated to container
This sounds to me like the libnetwork-plugin is trying to assign an IPv6 address when it shouldn't.
It seems to decide how to do that here: https://github.com/projectcalico/libnetwork-plugin/blob/e9d4f6cb286beee23503a0aae8963bef5c0a84ea/driver/network_driver.go#L498-L509
Based off of whether or not an IPv6 LL address is available on the host. Maybe we want to make that configurable, or smarter in some way?
@caseydavenport That's indeed what I initially thought. This can only really work properly when libnetwork-plugin can query whether or not IPv6 is enabled on the target network. The Docker network in question has "EnableIPv6": false,
, set when running inspect
on it, because we don't explicitly enable this when creating our networks (as intended).
There's also the case of IPv6 being enabled on the Docker network, but sysctl disabled on the system, though this shouldn't cause problems because it will still cause linkLocalAddr
to be nil
.
Any ideas how we can query EnableIPv6
in the target network?
Any ideas how we can query EnableIPv6 in the target network?
Looks like we have some logic already to inspect the network, might be as simple as using something like this? https://github.com/projectcalico/libnetwork-plugin/blob/e9d4f6cb286beee23503a0aae8963bef5c0a84ea/driver/network_driver.go#L583-L590
I believe I am having the exact same problem on centos 7 with Kernel 3.10.0-957.12.1.el7.x86_64. I upgraded from 3.10.0-862.14.4.el7.x86_64 and immediately started to get the same problems. Running the following (as described above) fixed it immediately echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6 echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
I didn't think this bug applied to me based on the title since I was still using kernel 3.x and my docker network has "EnableIPv6": false.
So is this solved?
We met this issue recently on some nodes after rebooting and it cost us a whole day to locate the issue. These issued nodes return normal after setting the kernel attributes disable_ipv6
.
Most nodes doesn't need it.
I got same problem.But I didn't fix it after disable IPv6.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: time=\\\\\\\"2020-03-26T14:30:51+08:00\\\\\\\" level=fatal msg=\\\\\\\"failed to add interface temp1181c31de18 to sandbox: error setting interface \\\\\\\\\\\\\\\"temp1181c31de18\\\\\\\\\\\\\\\" routes to [\\\\\\\\\\\\\\\"169.254.1.1/32\\\\\\\\\\\\\\\" \\\\\\\\\\\\\\\"fe80::b4fc:d8ff:fe11:f2bd/128\\\\\\\\\\\\\\\"]: permission denied\\\\\\\"\\\\n\\\"\"": unknown.
I got same problem.But I didn't fix it after disable IPv6.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: time=\\\\\\\"2020-03-26T14:30:51+08:00\\\\\\\" level=fatal msg=\\\\\\\"failed to add interface temp1181c31de18 to sandbox: error setting interface \\\\\\\\\\\\\\\"temp1181c31de18\\\\\\\\\\\\\\\" routes to [\\\\\\\\\\\\\\\"169.254.1.1/32\\\\\\\\\\\\\\\" \\\\\\\\\\\\\\\"fe80::b4fc:d8ff:fe11:f2bd/128\\\\\\\\\\\\\\\"]: permission denied\\\\\\\"\\\\n\\\"\"": unknown.
How do you disable it? Maybe you need disable and restart docker daemon.
echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6 echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
I disabled IPv6 as above.Then I reboot my server.
echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6 echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
I disabled IPv6 as above.Then I reboot my server.
Oh did you reboot your server?
So did you also check the configuration status after rebooting using sysctl net.ipv6.conf.all.disable_ipv6
?
In my thoughts settings will rollback if you just run echo
approach.
If you want them persistent you can edit server's /etc/rc.local
or /etc/sysctl.conf
. Take sysctl.conf for example:
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
And use sysctl -p
to make configuration take effect at once and they will automatically update in next rebooting.
It may be worthwhile mentioning this in the getting started docs (I don't think I saw it there) - this was a difficult one to track down.
Hi, any workaround for this ? some calico version that work, or maybe using centos 8 ? thanks
sysctl config disable ipv6
Step 1: add this rule in /etc/sysctl.conf : net.ipv6.conf.all.disable_ipv6=1
Step 2: add this rule in /etc/sysconfig/network : NETWORKING_IPV6=no
Step 4: disable the ip6tables service : systemctl disable ip6tables // or chkconfig ip6tables off
Step 5: Reload the sysctl configuration: sysctl -p