weave
weave copied to clipboard
Weave DNS is broken after updating Ubuntu 20.04
What you expected to happen?
Weave shouldn't crash when a container tries to resolve a hostname.
What happened?
Containers connected by weave are unable to communicate after applying Ubuntu OS updates. When a container using weave tries to access the network (proably only DNS), the weave container crashes.
How to reproduce it?
- install Ubuntu Server 20.04 using the default options in the installer
- update all packages:
apt-get update && apt-get -y upgrade && apt-get install docker.io - install Weave per docs:
curl -L git.io/weave -o /usr/local/bin/weave && sudo chmod a+x /usr/local/bin/weave weave launcheval $(weave env)- launch a container, and do something that requires dns:
$ docker run --rm -it weaveworks/ubuntu bash
root@622ec248cbf2:/# ping google.com
ERRO[0003] error waiting for container: unexpected EOF
$
And I'm kicked out of my container. This worked fine before the apt-get upgrade, and it still works ok if weave isn't involved (if I omit eval $(weave env)).
Anything else we need to know?
The problem seems to be tied to an update to the Ubuntu systemd package 245.4-4ubuntu3.3 that was published on 2020-11-04. I've experienced it on Ubuntu 20.04 and 20.10.
This version of systemd generates a line in /etc/resolv.conf which reads options edns0 trust-ad. The previous version (245.4-4ubuntu3) only generated the line options edns0, without the trust-ad. The new option triggers a bug in miekg/dns that was fixed a few years back: https://github.com/miekg/dns/commit/906238edc6eb0ddface4a1923f6d41ef2a5ca59b
I've tried removing trust-ad from resolv.conf and it does fix the crash on a simple test vm. On my "real" vms where I was using weave, containers were still unable to ping each other after getting rid of the crash, but that may be an unrelated problem.
Versions:
$ weave version
weave script 2.7.0
weave 2.7.0
$ docker version
Client:
Version: 19.03.8
API version: 1.40
Go version: go1.13.8
Git commit: afacb8b7f0
Built: Wed Oct 14 19:43:43 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.8
API version: 1.40 (minimum version 1.12)
Go version: go1.13.8
Git commit: afacb8b7f0
Built: Wed Oct 14 16:41:21 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.3-0ubuntu2
GitCommit:
runc:
Version: spec: 1.0.1-dev
GitCommit:
docker-init:
Version: 0.18.0
GitCommit:
$ uname -a
Linux ubuntu-weave-test 5.4.0-53-generic #59-Ubuntu SMP Wed Oct 21 09:38:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Logs:
$ docker logs weave
[......]
panic: runtime error: slice bounds out of range [:9] with length 8
goroutine 604 [running]:
github.com/miekg/dns.ClientConfigFromReader(0x1fae140, 0xc0004bc228, 0x0, 0x0, 0xc0004bc228)
/go/src/github.com/weaveworks/weave/vendor/github.com/miekg/dns/clientconfig.go:94 +0x823
github.com/miekg/dns.ClientConfigFromFile(0x7ffd25fa7efe, 0x23, 0x0, 0x0, 0x0)
/go/src/github.com/weaveworks/weave/vendor/github.com/miekg/dns/clientconfig.go:29 +0xc8
github.com/weaveworks/weave/nameserver.(*upstream).Config(0xc000556780, 0x0, 0x0, 0x0)
/go/src/github.com/weaveworks/weave/nameserver/dns.go:66 +0x197
github.com/weaveworks/weave/nameserver.(*handler).handleRecursive(0xc0005c3100, 0x20041e0, 0xc000790680, 0xc0004c5710)
/go/src/github.com/weaveworks/weave/nameserver/dns.go:268 +0xf5
github.com/miekg/dns.HandlerFunc.ServeDNS(0xc000118e00, 0x20041e0, 0xc000790680, 0xc0004c5710)
/go/src/github.com/weaveworks/weave/vendor/github.com/miekg/dns/server.go:84 +0x44
github.com/miekg/dns.(*ServeMux).ServeDNS(0xc000118d90, 0x20041e0, 0xc000790680, 0xc0004c5710)
/go/src/github.com/weaveworks/weave/vendor/github.com/miekg/dns/server.go:210 +0x62
github.com/miekg/dns.(*Server).serve(0xc0004340c0, 0x1fc3d60, 0xc00210a090, 0x1fad080, 0xc000118d90, 0xc0024e6e00, 0x1c, 0x200, 0xc0000108d0, 0xc00047bba0, ...)
/go/src/github.com/weaveworks/weave/vendor/github.com/miekg/dns/server.go:567 +0x271
created by github.com/miekg/dns.(*Server).serveUDP
/go/src/github.com/weaveworks/weave/vendor/github.com/miekg/dns/server.go:523 +0x2a0
@emfrias
I've faced with the same bug today. Thank you for the report, it gave me a short way to fix the problem.
The initial issue with weave is that it is still using the old version of miekg/dns - 1.0.4 in vendors. I updated to the new version v1.0.5 of miekg/dns (you can edit it in go.mod file) that already has that bug fixed and build new weave images. And then use the newly built images on hosts. Hope that will help you.
Thanks @Cybernisk, that helped. I just built new images as you described and they've been working well so far. I'm new to this build system so I did it wrong a few times before getting a version that actually had the new version of the dns module. I wound up with:
git clone https://github.com/weaveworks/weave.git
cd weave
go get github.com/miekg/[email protected]
go mod vendor
make
Is it likely that this simple dependency update will find its way into the next release of the binary?
Unfortunately the described workaround does not work here as expected:
Step 9/15 : RUN go get github.com/weaveworks/build-tools/cover github.com/mattn/goveralls golang.org/x/lint/golint github.com/fzipp/gocyclo github.com/fatih/hclfmt github.com/client9/misspell/cmd/misspell
---> Running in 8700989776c1
cannot find package "github.com/hashicorp/hcl/hcl/printer" in any of:
/usr/local/go/src/github.com/hashicorp/hcl/hcl/printer (from $GOROOT)
/go/src/github.com/hashicorp/hcl/hcl/printer (from $GOPATH)
The command '/bin/sh -c go get github.com/weaveworks/build-tools/cover github.com/mattn/goveralls golang.org/x/lint/golint github.com/fzipp/gocyclo github.com/fatih/hclfmt github.com/client9/misspell/cmd/misspell' returned a non-zero code: 1
make: *** [Makefile:255: .build.uptodate] Error 1
Building succeeded with the following patch:
diff --git a/build/Dockerfile b/build/Dockerfile
index ae6a677..e47913e 100644
--- a/build/Dockerfile
+++ b/build/Dockerfile
@@ -49,6 +49,9 @@ RUN curl -fsSLo shfmt https://github.com/mvdan/sh/releases/download/v1.3.0/shfmt
mv shfmt /usr/bin
# Install common Go tools
+RUN GO111MODULE=on go get github.com/hashicorp/[email protected]; \
+ mkdir -p /go/src/github.com/hashicorp; \
+ ln -s $PWD/pkg/mod/github.com/hashicorp/[email protected] $PWD/src/github.com/hashicorp/hcl
RUN go get \
github.com/weaveworks/build-tools/cover \
github.com/mattn/goveralls \
Then the Ubuntu 20.04 golang version is not up-to-date, and another issue with modules not being in sync will be displayed. My workaround to be able to run make completely was to update go to latest:
apt remove golang --purge --autoremove
curl -LO https://get.golang.org/$(uname)/go_installer && chmod +x go_installer && ./go_installer && rm go_installer
Unfortunately this also didn't produce a weave executable that would not crash upon weave status.
It's a pity to see this break on a very common platform.
You're right. I can't explain why, but my steps no longer work, but the changes @almereyda mentions get it building again for me.
From a clean ubuntu:20.04 machine:
sudo apt -y install build-essential git docker.io
curl -LO https://get.golang.org/$(uname)/go_installer && chmod +x go_installer && ./go_installer && rm go_installer
. ~/.bash_profile
git clone https://github.com/weaveworks/weave.git
cd weave
# patch build/Dockerfile using almereyda's patch above
go get github.com/miekg/[email protected]
go mod vendor
make
I didn't mention these steps earlier because I figured they'd be a bit different depending on your setup.
We've just built images for weaveworks/weave:latest and its helpers on the local system. I run weave
using the script you get from sudo curl -L git.io/weave -o /usr/local/bin/weave. That script will
try to run weaveworks/weave:2.8.1 by default, and since we didn't build that, it will download it from
docker hub and ignore the custom version we built. The simplest change is to edit the weave script:
--- weave.orig 2021-02-11 17:40:59.835349520 +0000
+++ /usr/local/bin/weave 2021-02-11 17:43:42.022209305 +0000
@@ -3,7 +3,7 @@
[ -n "$WEAVE_DEBUG" ] && set -x
-SCRIPT_VERSION="2.8.1"
+SCRIPT_VERSION="unreleased"
IMAGE_VERSION=latest
[ "$SCRIPT_VERSION" = "unreleased" ] || IMAGE_VERSION=$SCRIPT_VERSION
IMAGE_VERSION=${WEAVE_VERSION:-$IMAGE_VERSION}
and it will stick to using the latest tag we built.
This should give you a version that works on this one machine.
I went a step further and pushed the new images to my private docker registry
MY_DOCKER_REGISTRY=docker-registry.me.com
for image in weave weaveexec weave-kube weave-npc weavedb network-tester; do
sudo docker tag weaveworks/$image:latest $MY_DOCKER_REGISTRY/weaveworks/$image:latest
sudo docker push $MY_DOCKER_REGISTRY/weaveworks/$image:latest
done
and then make one more edit to /usr/local/bin/weave:
--- weave.new 2021-02-11 18:08:12.349300226 +0000
+++ /usr/local/bin/weave 2021-02-11 18:09:46.034718622 +0000
@@ -12,7 +12,7 @@
MIN_DOCKER_VERSION=1.10.0
# These are needed for remote execs, hence we introduce them here
-DOCKERHUB_USER=${DOCKERHUB_USER:-weaveworks}
+DOCKERHUB_USER=${DOCKERHUB_USER:-docker-registry.me.com/weaveworks}
BASE_EXEC_IMAGE=$DOCKERHUB_USER/weaveexec
EXEC_IMAGE=$BASE_EXEC_IMAGE:$IMAGE_VERSION
WEAVEDB_IMAGE=$DOCKERHUB_USER/weavedb:latest
Now I can distribute this patched version of /usr/local/bin/weave to all my servers and they'll get the patched version of weave. If you don't have a private registry set up, you could manually load your patched binaries on each of your other systems (I guess using something like docker load < weave.tar.gz), and also copy over the patched /usr/local/bin/weave.
It looks like you could just set environment variables rather than patching the weave binary if that's easier for you.
This product is becoming increasingly difficult to justify when it doesn't run on Ubuntu 20.04. This issue has been open for six months - does Weaveworks actively monitor this forum?
Hi, I'm the Weaveworks CEO. We do keep an eye on these forums. At present we work on Weave Net for paying customers or as part of other commercial work.
Any progress on this issue? It's really disappointing situation. Almost a year passed since issue was opened.
At present we work on Weave Net for paying customers or as part of other commercial work.
Why would someone pay for something that is broken?
Looks like, the actual bug is in miekg/dns vendor code, not too experienced in Go, but looks like it checks a string length of 8 characters, and then tries to cut it to 9.
Ubuntu default resolv.conf now includes the string "trust-ad", which is 8 characters, and (i guess) line 94 breaks on this: https://github.com/weaveworks/weave/blob/master/vendor/github.com/miekg/dns/clientconfig.go
The original vendor code seems to be fixed, i think to solve this problem, it would be enough to upgrade: https://github.com/miekg/dns/blob/master/clientconfig.go
Checked my hosts, weave works on my hosts not having any 8-char long entry in the options in the resolv.conf, but breaks on hosts which do. Weave can be launched with the --no-dns option, and i can get a working "weave status", but that way it wouldn't really be usable.
Any idea for a workaround without messing up the automatic resolv.conf?
Figured a workaround which just needs editing the script, by removing the options from resolv.conf weave already ignores, and mounting that file.
+++ b/weave
@@ -136,6 +136,7 @@ exec_remote() {
$(docker_run_options) \
--pid host \
$(exec_options "$@") \
+ -v /usr/local/bin/weave:/home/weave/weave \
-e DOCKERHUB_USER="$DOCKERHUB_USER" \
-e WEAVE_VERSION \
-e WEAVE_DEBUG \
@@ -1167,14 +1168,7 @@ launch() {
# Figure out the location of the actual resolv.conf file because
# we want to bind mount its directory into the container.
- if [ -L ${HOST_ROOT:-/}/etc/resolv.conf ]; then # symlink
- # This assumes a host with readlink in FHS directories...
- # Ideally, this would resolve the symlink manually, without
- # using host commands.
- RESOLV_CONF=$(chroot ${HOST_ROOT:-/} readlink -f /etc/resolv.conf)
- else
- RESOLV_CONF=/etc/resolv.conf
- fi
+ RESOLV_CONF=/etc/resolv.weave.conf
RESOLV_CONF_DIR=$(dirname "$RESOLV_CONF")
RESOLV_CONF_BASE=$(basename "$RESOLV_CONF")
It uses the file resolv.weave.conf, in my case i just edited the original resolv.conf with sed to remove the trust-ad option, generated at boot time with systemd.
Encountered this issue on ubuntu 22.04 because of options edns0 trust-ad in my /etc/resolv.conf.
What's weird is that on ubuntu 20.04 it was options edns0 and weave was working well.
We're facing same issue with weave on Ubuntu Server 20.04. Any workaround without modifying resolv.conf?
Switch to Calico? You get all the same features and more.
@withinboredom We need this for a nomad cluster. We're currently using weave for our job. We didn't find any supportive docs related to Calico with nomad cluster.
Hi, are there any updates to this?