bugs
bugs copied to clipboard
Adding pointopoint addresses to tun0 interface too quickly after opening intermittently fails quietly (openvpn)
Issue Report
Bug
Container Linux Version
$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1688.5.3
VERSION_ID=1688.5.3
BUILD_ID=2018-04-03-0547
PRETTY_NAME="Container Linux by CoreOS 1688.5.3 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
Environment
What hardware/cloud provider/hypervisor is being used to run Container Linux?
CoreOS is running on an AWS EC2 instance. This bug was observed running inside Ubuntu and Alpine containers with host networking enabled. The failing command was issued by openvpn.
Expected Behavior
After the following commands run:
May 08 18:31:51 ip-10-0-1-98 docker[1293]: Tue May 8 18:31:51 2018 TUN/TAP device tun0 opened
May 08 18:31:51 ip-10-0-1-98 docker[1293]: Tue May 8 18:31:51 2018 TUN/TAP TX queue length set to 100
May 08 18:31:51 ip-10-0-1-98 docker[1293]: Tue May 8 18:31:51 2018 do_ifconfig, tt->did_ifconfig_ipv6_setup=0
May 08 18:31:52 ip-10-0-1-98 docker[1293]: Tue May 8 18:31:52 2018 /sbin/ifconfig tun0 172.16.1.18 pointopoint 172.16.1.17 mtu 1500
The pointopoint addresses should be added to the tun0 interface, and be visible in ifconfig
output.
Actual Behavior
When the pointopoint command runs successfully, the interface looks like this:
tun0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 1500
inet 172.16.1.18 netmask 255.255.255.255 destination 172.16.1.17
inet6 fe80::eba2:2516:2600:acb5 prefixlen 64 scopeid 0x20<link>
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 100 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 9 bytes 1438 (1.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
When the pointopoint command fails (quietly), the interface looks like this:
tun0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 1500
inet6 fe80::5ff2:ef48:6bbc:e9a8 prefixlen 64 scopeid 0x20<link>
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 100 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 14 bytes 2276 (2.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
When I modified the openvpn source to add a 1 second sleep before the pointopoint command, the pointopoint command went from intermittently failing to succeeding every time.
Reproduction Steps
- (Run openvpn client in an ubuntu container)
- Bring up a tun0 interface, then immediately add pointopoint addresses
Other Information
I haven't yet been able to pinpoint which part of the stack is responsible for the bug. It's possibly an openvpn or docker issue, but I'm led to believe that it is a coreos issue, because:
- the docker container is using host networking and has network admin privileges
- examining the openvpn source, these commands appear to be run in a sequential, straightforward manner
- adding a 1 second sleep before the pointopoint command resolved the issue, suggesting some sort of race condition (like the tun0 device hadn't fully come up)
We have been running this implementation across a number of production servers, and didn't notice any issue until recently (~1mo ago).
Is it possible to do so using the ip
command, instead of ifconfig
? ip
generally issues a single netlink request (e.g. create device, set up, add address), so it's easier to see which step is failing.
May 08 19:54:00 ip-10-0-1-98 docker[4475]: Tue May 8 19:54:00 2018 TUN/TAP device tun0 opened
May 08 19:54:00 ip-10-0-1-98 docker[4475]: Tue May 8 19:54:00 2018 TUN/TAP TX queue length set to 100
May 08 19:54:00 ip-10-0-1-98 docker[4475]: Tue May 8 19:54:00 2018 do_ifconfig, tt->did_ifconfig_ipv6_setup=0
May 08 19:54:00 ip-10-0-1-98 docker[4475]: Tue May 8 19:54:00 2018 /sbin/ip link set dev tun0 up mtu 1500
May 08 19:54:00 ip-10-0-1-98 docker[4475]: Tue May 8 19:54:00 2018 /sbin/ip addr add dev tun0 local 172.16.1.18 peer 172.16.1.17
This output is from openvpn running in an Alpine container. It defaulted to using ip
here (and defaulted to ifconfig
in the ubuntu container.)
I observed the same intermittent failures and successes with this implementation. The logging here is unreliable (async). There are some errors below:
May 08 19:55:00 ip-10-0-1-98 docker[4475]: RTNETLINK answers: Network unreachable
But, I don't know if they correspond with the ip addr add
command. I do know that that command was unsuccessful, due to the state of the tun0 interface via the ifconfig output.
@collin-bachi-sp did this use to work with a previous ContainerLinux release? If so, what was the last working kernel? Can you try to check if the same issue happens on latest alpha release (kernel 4.16)?
I had the same problem and for me that workaround works: https://github.com/kylemanna/docker-openvpn/issues/370
@lucab i have the same issue with a different container (zerotier), tried the latest alpha (1786.2.0) but still the same problem
initial assignment works on joining a network (because the interface comes up and gets the ip when the client is authorized "later")
but on a restart of the service it is not able to assign the ip any more
containers i tried: zerotier/zerotier-containerized
and zyclonite/zerotier
Are you still seeing this on current releases of Container Linux?
yes, it still happened, the only difference was that directly after the update "boot" the interface came up but after restarting only the process it failed (stable branch)
I also still have that problem
ID=coreos
VERSION=1855.4.0
Same here..
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1855.4.0
VERSION_ID=1855.4.0
BUILD_ID=2018-09-11-0003
PRETTY_NAME="Container Linux by CoreOS 1855.4.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
This is still with us.
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=2135.4.0
VERSION_ID=2135.4.0
BUILD_ID=2019-06-24-2257
PRETTY_NAME="Container Linux by CoreOS 2135.4.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"