for-linux icon indicating copy to clipboard operation
for-linux copied to clipboard

Container Attachment To Custom Bridge Network Causes Host Network Interruption (IPv6?)

Open Alex-Richman opened this issue 5 years ago • 13 comments

We've been seeing an issue with host network interruptions when starting/stopping our development environment, which uses Docker heavily. This manifests as ERR_NETWORK_CHANGED in Chrome and WiFi connections flapping down/back up.

After some debugging I think I've narrowed it down to:

  • Only happens when using a custom bridge network
  • Interruption occurs once per container network attachment/detachment from that bridge network (via the new veth device)
  • Only happens if IPv6 is enabled on the host (docker daemon.json IPv6 config irrelevant). Completely disabling IPv6 resolves this.

Potentially relevant journal entries related to IPv6 ADDRCONF on the new veth devices. These log entries are the only ones missing when IPv6 is disabled (and the issue is not present):

Jan 28 01:01:45 MinyArch kernel: IPv6: ADDRCONF(NETDEV_UP): veth7698f4b: link is not ready
Jan 28 01:01:45 MinyArch kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth7698f4b: link becomes ready

Prior art (some only tangentially related):

  • https://askubuntu.com/questions/886107/google-chrome-error-21-neterr-network-changed/909631#909631
  • https://askubuntu.com/questions/958902/err-network-changed-on-chrome-with-ubuntu-17-04
  • https://superuser.com/questions/747735/regularly-getting-err-network-changed-errors-in-chrome
  • https://bugs.chromium.org/p/chromium/issues/detail?id=974711

We've seen this on Ubuntu LTS (18.04) and Debian Stretch (9.x)

I can't find any conclusions in the prior art (or in any docker issues) as to what is actually causing this, just "disable IPv6 / stop docker lol". Thought it might be useful to raise here to see if there were any further thoughts.

Output of docker version:

Client:
 Version:           18.09.5
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        e8ff056dbc
 Built:             Thu Apr 11 04:44:28 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:20:35 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.5
  GitCommit:        bb71b10fd8f58240ca47fbb579b9d1028eea7c84
 runc:
  Version:          1.0.0-rc6+dev
  GitCommit:        2b18fe1d885ee5083ef9f0838fee39b62d653e30
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Containers: 14
 Running: 0
 Paused: 0
 Stopped: 14
Images: 1536
Server Version: 19.03.1
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.0-7-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.4GiB
Name: MinyArch
ID: XU54:FOVV:H2YM:JJSY:TVQ5:JGS2:67BG:TNDF:545U:WCTM:VPVU:T4OR
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Alex-Richman avatar Jan 28 '20 01:01 Alex-Richman

I think I've got the same problem, but even without starting a container: as soon as I add ipv6 to docker daemon.json and the docker service is started, my host becomes hardly reachable using IPv6 (I tested SSH and HTTPS ports).

sveyret avatar Apr 07 '20 06:04 sveyret

This happens for me as well on Ubuntu 20.04. It may be related to a container crashing and constantly restarting.

It manifests in Chrome as connection flapping:

image

EDIT: after receiving a number of downvotes I removed the full log. Check the comment history if you would like to access it.

gardner avatar Jun 08 '20 03:06 gardner

Running Ubuntu 20.04, Docker version 19.03.14, build 5eb3275d40 and Google Chrome 87.0.4280.88. Same issue.

I have some Containers running via docker-compose with "restart: unless-stopped", when the restart because the process is finished Chrome throws ERR_NETWORK_CHANGED when the Interfae is tentative.

VimS avatar Dec 04 '20 07:12 VimS

Running Ubuntu 21.04 and Ubuntu 20.10, with docker 20.10.6, I'm having the same issue. Connection fails randomly for the containers, it also impacts web browsing, I'm getting error ERR_NETWORK_CHANGED with chrome quite often. By looking at the logs, it seems like there's a conflict with NetworkManager trying to handle docker interfaces. I tried to set docker interfaces to unmanaged using /etc/NetworkManager/NetworkManager.conf, but it didn't change anything.

Disabling ipv6 for the whole system fixes the issue, but it can't be disabled anymore in the latest version due https://github.com/moby/moby/issues/42288.

I don't have any custom bridge, and simply running docker produces the error. Logs look pretty much identical with ipv6 enabled and disabled, but it seems like with ipv6 disabled it reaches a stable state in a few seconds, whereas it does not with ipv6 enabled and the issues do not stop showing up in the logs.

My current workaround is to use the older 20.10.5 version, and disable ipv6 in the kernel.

jixbo avatar Apr 16 '21 12:04 jixbo

We use a lot of docker containers with ipv6 and Chrome inside. disable ipv6 in the kernel is not a solution for us. Are any other workarounds?

mzhirnov1 avatar Apr 20 '21 07:04 mzhirnov1

@mzhirnov1 Setting the "fixed-cidr-v6" flag in daemon.json as described here worked for us.

steersbob avatar Apr 20 '21 11:04 steersbob

@mzhirnov1 Setting the "fixed-cidr-v6" flag in daemon.json as described here worked for us.

Do you have problems like: [1372074.839350] IPv6: ADDRCONF(NETDEV_CHANGE): vethecee105: link becomes ready

Uploading image.png…

mzhirnov1 avatar Apr 21 '21 19:04 mzhirnov1

Symptoms are as described by OP:

  • Any container starting or restarting interrupts all network interfaces (containers and host).
  • We see a lot of ADDRCONF / NETDEV spam in logs. Both the "link becomes ready", and "entered blocking mode / non blocking mode". Hard to say which of them is related to the actual network interrupt.
  • Symptoms only appear if IPv6 is enabled on the host and the router (presumably the DHCP portion of the router).

Effective workarounds are to either disable IPv6 on the host through sysctl, or the above described changes to daemon.json.

We haven't done any in-depth digging through docker / IPv6 network handling to find the root cause. We kind of threw workarounds at the wall until one stuck, and went back to the features and bugs in our own software.

steersbob avatar Apr 21 '21 20:04 steersbob

Hi, any status update? We experience the exact same situation (PopOS 21.10, docker: 20.10.12) as described by @steersbob. This issue needs a lot more attention.

Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:33 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:43:41 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

StarpTech avatar Dec 16 '21 11:12 StarpTech

Been bothering me for years now. Frustrating. Any workarounds?

DanielJoyce avatar Mar 01 '23 22:03 DanielJoyce

Been bothering me for years now. Frustrating. Any workarounds?

I switched to Podman for this and other reasons

universam1 avatar Mar 02 '23 12:03 universam1

Modyfing the didn't help me. But disabling ipv6 for my wlan networks resolve the issue. If I set this on the kernel a lot of stuff is crashing.

#!/bin/bash

sudo sysctl -w net.ipv6.conf.wlp8s0.disable_ipv6=1

sudo sysctl -p

wlp8s0 is my wireless network. Check one by one ifconfig -a which network causing this.

StarpTech avatar May 19 '23 19:05 StarpTech

Still having this extremely annoying issue. Considering moving to podman too.

phito avatar Sep 04 '24 13:09 phito