libnetwork-plugin icon indicating copy to clipboard operation
libnetwork-plugin copied to clipboard

[libnetwork] Calico does not work properly on systems with kernel version 4.x+ unless ipv6 network is disabled

Open ansiz opened this issue 6 years ago • 14 comments

When I run:

docker run --privileged -tid --rm --network net2 --name k530-net2 harbor.hpc.com/images/busybox

docker reported a problem:

15ba23b49172c9dc4f0643f3f11984ce02c878a60bafccb268becec600330a8f
docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: 
starting container process caused "process_linux.go:402: container init caused 
\"process_linux.go:385: running prestart hook 0 caused \\\"error running hook: exit status 1,
 stdout: , stderr: time=\\\\\\\"2018-09-16T22:25:13-04:00\\\\\\\" level=fatal msg=
\\\\\\\"failed to add interface temp31556e7d316 to sandbox: error setting interface 
\\\\\\\\\\\\\\\"temp31556e7d316\\\\\\\\\\\\\\\" routes to [\\\\\\\\\\\\\\\"169.254.1.1/32\\\\\\\\\\\\\\\" 
\\\\\\\\\\\\\\\"fe80::b448:31ff:fee4:de7d/128\\\\\\\\\\\\\\\"]: permission denied\\\\\\\"\\\\n\\\"\"": unknown.

I can run this command on standard CentOS 7.x with kernel 3.x and it also not work on ubuntu 18.04 which has kernel 4.x, I found some log in dmesg:

[ 2111.674564] IPv6: ADDRCONF(NETDEV_UP): temp66aa9bddf71: link is not ready
[ 2111.674700] IPv6: ADDRCONF(NETDEV_UP): cali66aa9bddf71: link is not ready
[ 2111.674710] IPv6: ADDRCONF(NETDEV_CHANGE): cali66aa9bddf71: link becomes ready
[ 2111.674760] IPv6: ADDRCONF(NETDEV_CHANGE): temp66aa9bddf71: link becomes ready
[ 2111.926941] cali0: renamed from temp66aa9bddf71
[ 2113.110629] IPv6: ADDRCONF(NETDEV_UP): tempf1169b462ad: link is not ready
[ 2113.111066] IPv6: ADDRCONF(NETDEV_CHANGE): tempf1169b462ad: link becomes ready
[ 2113.325654] cali0: renamed from tempf1169b462ad
[ 2114.395699] IPv6: ADDRCONF(NETDEV_UP): tempc99fe2a39dc: link is not ready
[ 2114.400374] IPv6: ADDRCONF(NETDEV_CHANGE): tempc99fe2a39dc: link becomes ready
[ 2114.571455] cali0: renamed from tempc99fe2a39dc
[ 2115.557923] IPv6: ADDRCONF(NETDEV_UP): tempa2528b66f07: link is not ready
[ 2115.563399] IPv6: ADDRCONF(NETDEV_CHANGE): tempa2528b66f07: link becomes ready
[ 2115.744184] cali0: renamed from tempa2528b66f07

So I try to disable ipv6 with command:

echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6

Then it works fine

Expected Behavior

I hope Calico 2.6 can work properly on systems with kernel version 4.x without ipv6 disabled.

Possible Solution

Disable ipv6

echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6

Steps to Reproduce (for bugs)

  1. Install Calico 2.6 on the systems with kernel 4.x+
  2. Try to create a container with calico network

Context

Your Environment

  • Calicoctl version v1.6.4, build ae98f46f
  • Docker without orchestration
  • Operating System and version: CentOS Linux release 7.5.1804 (Core) Kernel: Linux 4.18.7

ansiz avatar Sep 17 '18 02:09 ansiz

Hi, thanks for the report! We're experiencing the exact same issue, but the behaviour seems very flakey. Given many retries/re-schedules, chances are most containers will be successfully started eventually. This started appearing for us when we went from 4.15.15 to 4.16.x (and now 4.18.10).

We are using libnetwork, which is likely the case for OP as well. Calico tries to set an IPv6 address on a container interface that should not be v6-enabled. Logging into the container does not show a (stateless) link-local fe80 or anything auto-assigned by the kernel. The Docker network's EnableIPv6 is set to False, and none of the containers we run have anything set in their IPv6Address fields.

# docker network inspect <net>
...
                "IPv4Address": "10.123.121.83/32",
                "IPv6Address": ""

Could this be a fallback mechanism in case IPv6Address is empty? Newer kernel versions likely reject unwanted addresses instead of silently dropping the Netlink messages, or are rejected using a different errno.

@caseydavenport @fasaxc Any ideas?

ti-mo avatar Sep 28 '18 13:09 ti-mo

@ti-mo

We're experiencing the exact same issue, but the behaviour seems very flakey. Given many retries/re-schedules, chances are most containers will be successfully started eventually.

Yes, container will be successfully started after many retries, but the network cannot communicate even if the container is already started.

The same behavior with the command: docker network connect, the network cannot communicate even if the IP has allocated to container

ansiz avatar Sep 29 '18 02:09 ansiz

This sounds to me like the libnetwork-plugin is trying to assign an IPv6 address when it shouldn't.

It seems to decide how to do that here: https://github.com/projectcalico/libnetwork-plugin/blob/e9d4f6cb286beee23503a0aae8963bef5c0a84ea/driver/network_driver.go#L498-L509

Based off of whether or not an IPv6 LL address is available on the host. Maybe we want to make that configurable, or smarter in some way?

caseydavenport avatar Oct 15 '18 20:10 caseydavenport

@caseydavenport That's indeed what I initially thought. This can only really work properly when libnetwork-plugin can query whether or not IPv6 is enabled on the target network. The Docker network in question has "EnableIPv6": false,, set when running inspect on it, because we don't explicitly enable this when creating our networks (as intended).

There's also the case of IPv6 being enabled on the Docker network, but sysctl disabled on the system, though this shouldn't cause problems because it will still cause linkLocalAddr to be nil.

Any ideas how we can query EnableIPv6 in the target network?

ti-mo avatar Oct 22 '18 10:10 ti-mo

Any ideas how we can query EnableIPv6 in the target network?

Looks like we have some logic already to inspect the network, might be as simple as using something like this? https://github.com/projectcalico/libnetwork-plugin/blob/e9d4f6cb286beee23503a0aae8963bef5c0a84ea/driver/network_driver.go#L583-L590

caseydavenport avatar Jan 11 '19 19:01 caseydavenport

I believe I am having the exact same problem on centos 7 with Kernel 3.10.0-957.12.1.el7.x86_64. I upgraded from 3.10.0-862.14.4.el7.x86_64 and immediately started to get the same problems. Running the following (as described above) fixed it immediately echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6 echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6

I didn't think this bug applied to me based on the title since I was still using kernel 3.x and my docker network has "EnableIPv6": false.

merickso avatar May 08 '19 19:05 merickso

So is this solved?

We met this issue recently on some nodes after rebooting and it cost us a whole day to locate the issue. These issued nodes return normal after setting the kernel attributes disable_ipv6. Most nodes doesn't need it.

jasonjoo2010 avatar Dec 22 '19 09:12 jasonjoo2010

I got same problem.But I didn't fix it after disable IPv6. docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: time=\\\\\\\"2020-03-26T14:30:51+08:00\\\\\\\" level=fatal msg=\\\\\\\"failed to add interface temp1181c31de18 to sandbox: error setting interface \\\\\\\\\\\\\\\"temp1181c31de18\\\\\\\\\\\\\\\" routes to [\\\\\\\\\\\\\\\"169.254.1.1/32\\\\\\\\\\\\\\\" \\\\\\\\\\\\\\\"fe80::b4fc:d8ff:fe11:f2bd/128\\\\\\\\\\\\\\\"]: permission denied\\\\\\\"\\\\n\\\"\"": unknown.

rico-qian avatar Mar 26 '20 06:03 rico-qian

I got same problem.But I didn't fix it after disable IPv6. docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: time=\\\\\\\"2020-03-26T14:30:51+08:00\\\\\\\" level=fatal msg=\\\\\\\"failed to add interface temp1181c31de18 to sandbox: error setting interface \\\\\\\\\\\\\\\"temp1181c31de18\\\\\\\\\\\\\\\" routes to [\\\\\\\\\\\\\\\"169.254.1.1/32\\\\\\\\\\\\\\\" \\\\\\\\\\\\\\\"fe80::b4fc:d8ff:fe11:f2bd/128\\\\\\\\\\\\\\\"]: permission denied\\\\\\\"\\\\n\\\"\"": unknown.

How do you disable it? Maybe you need disable and restart docker daemon.

jasonjoo2010 avatar Mar 26 '20 06:03 jasonjoo2010

echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6 echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6 I disabled IPv6 as above.Then I reboot my server.

rico-qian avatar Mar 26 '20 06:03 rico-qian

echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6 echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6 I disabled IPv6 as above.Then I reboot my server.

Oh did you reboot your server? So did you also check the configuration status after rebooting using sysctl net.ipv6.conf.all.disable_ipv6 ?

In my thoughts settings will rollback if you just run echo approach. If you want them persistent you can edit server's /etc/rc.local or /etc/sysctl.conf. Take sysctl.conf for example:

net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1

And use sysctl -p to make configuration take effect at once and they will automatically update in next rebooting.

jasonjoo2010 avatar Mar 26 '20 07:03 jasonjoo2010

It may be worthwhile mentioning this in the getting started docs (I don't think I saw it there) - this was a difficult one to track down.

darrena092 avatar Apr 16 '20 14:04 darrena092

Hi, any workaround for this ? some calico version that work, or maybe using centos 8 ? thanks

oshoval avatar Apr 23 '20 14:04 oshoval

sysctl config disable ipv6

Step 1: add this rule in /etc/sysctl.conf : net.ipv6.conf.all.disable_ipv6=1

Step 2: add this rule in /etc/sysconfig/network : NETWORKING_IPV6=no

Step 4: disable the ip6tables service : systemctl disable ip6tables // or chkconfig ip6tables off

Step 5: Reload the sysctl configuration: sysctl -p

cucker0 avatar Oct 25 '21 02:10 cucker0