docker-consul
docker-consul copied to clipboard
DNS recursor issue with --net="host"
This is a somewhat convoluted issue, please bear with me. It is related to these issues: https://github.com/hashicorp/consul/issues/602 https://github.com/hashicorp/consul/pull/724
When running on an environment such as AWS, you may need (as i do) to be able to resolve both the Amazon servers (for example) and the services that are registered at the Consul cluster. (from inside other containers)
There is an issue with the Consul Docker image, where you must run the container using --net="host". otherwise, communication is unstable.
Your local Consul-agent is used as DNS, but recurse to 8.8.8.8 - an issue that was addressed in the link above. So now you could use -recursor=[internal-network-DNS], and you would be able to resolve both ec2.internal (for example) and service.consul. great! right?
but wait! you are using --net="host", so you are getting your container's resolve.conf file from the host! and in that file, the consul search domain, and the localhost nameserver are not configured. (isolation!) AND you can NOT use the -dns and -dns-search flags! (and port 53 in now occupied)
so, for this feature to actually work, you will need to either get the Consul-Docker container to work without using --net="host" OR allow the -dns-search flag so you could modify the /etc/resolve.conf file using the "docker run" command. (or hack it in the Dockerfile using env vars and bash)
otherwise this will require you to modify the resolve.conf on each and every host you are running the Consul agent on, to something like: "search ec2.internal service.consul nameserver 127.0.0.1 nameserver [internal-network-DNS]"
which, of course, goes against the entire Docker concept of containment / isolation / run anywhere etc.
Interesting. Good find. We've learned a lot about running Consul and it's not well reflected in this container yet. It's overdue for some love. Anyway, I think we've learned it is best to run in --net=host mode so we want to make sure that works.
Can you tell me what you mean by allow the -dns-search flag?
Hi Progrium, thank you for your response.
using the --dns and --dns-search with the run command, allows you to basically set the resolve.conf file using the command line when you run your container. however, these flags are not allowed / ineffective when running with --net="host".
if you've got a host running a container with --net="host", you can't set its /etc/resolve.conf file using these flags, so you can't tell it to use the local Consul instance as a DNS.
If you could do that, it would solve your need to reconfigure the /etc/resolve.conf file on the host itself.
(some people are using dnsmasq on each host - which is an inelegant solution, since the Consul should serve as DNS, but if you are running the Consul-agent with --net="host" you still need to resolve that pesky port 54 collision)
(sorry, closed by mistake... pressed wrong button...)
Is there still an performance issue with docker consul when not using --net=host ?
As a workaround, I run dnsmasq in a container and specify the it as the recursor https://github.com/NoumanSaleem/docker-consul/blob/master/config/consul.json
New version of docker-consul coming soon that resolves this. Also, this is experimental still but worth a try (doesn't work with systemd): https://github.com/mgood/resolvable
our question is the same . when you run a container with --net=host, registrator cat not find the service , and not to add consul.
Well that would be an issue with registrator then, right?
New version of docker-consul coming soon that resolves this
Two things:
- This container (and others made by you) is awesome, thank you for all the hard work!
- Really looking forward to the new version with this problem solved.
if I recall, the Registrator will register containers even when using "--net=host" as long as the ports are still specified in the run command line.
while this is not exactly pertinent to the original issue, if you could run the Consul without using "--net=host" this issue would never have come up. (because you would be able to control resolve.conf inside the container by passing parameters in the run command)
Docker --net=host overwrites any resolv.conf in the container. What I've had to do is write resolv.conf in my entrypoint script as a hereto doc. Then all the containers inherit this. Works but I wish docker overwrote the resolv file when you use --net=host and allow an --dns to override it.
From the original issue text, I would say that this is resolved by specifying the -recursor command line option on consul. Unfortunately this is documented but not in 0.5 of Consul and awaiting release.
@mvanholsteijn , that is a prerequisite, yes. But you still need to be able tell containers to use the Consul as a DNS. The Docker command line option would be ineffective when using --net=host.
Bump. A few questions about this:
- Is it still recommended/necessary to run dockerized consul with
--net=host? - If so, is it as simple as that +
consul agent --recursor=<INTERNAL_DNS> <other_settings>(Consul >= 0.5.2) + container run with--dns 172.17.42.1 --dns-search service.consul?
@progrium What are the lessons learned re: running Consul which aren't reflected in this container?
Yes, new version of this image will be documented to run it by default in net=host mode.
I believe your second bullet point would work, but we typically deploy with resolvable or some kind of dnsmasq on the local host to solve this and other issues (allows you to not worry about configuring any recursing in consul. it also takes consul out of critical path for dns, which is preferred).
@progrium Thanks for the reply. What are your plans for updating docs and/or releasing an updated image?
Currently in master. Docs coming soon.
Bump, any ETA for docs?
the docs page needs some update, here's an incomplete list of changes:
update image name
replace all progrium/consul with gliderlabs/consul
update run command
$ docker run -p 8400:8400 -p 8500:8500 -p 8600:53/udp -h node1 progrium/consul -server -bootstrap
replace with:
$ docker run -p 8400:8400 -p 8500:8500 -p 8600:53/udp -h node1 gliderlabs/consul agent -server -bootstrap -client=0.0.0.0
took a while to figure out -client=0.0.0.0 is required, since it binds to localhost by default (used to be 0.0.0.0 in progrium/consul) so you can't connect outside the container
the WebUi is not available with gliderlabs/consul but gliderlabs/consul-server.
dns lookups don't work
$ dig @0.0.0.0 -p 8600 node1.node.consul
this doesn't work, times out
netstat -tuln | grep 8600 returns:
udp6 0 0 :::8600 :::*
this also times out:
dig -6 @:: -p 8600 node1.node.consul
digging at container's IP works fine:
dig @172.17.0.15 -p 8600 node1.node.consul
; <<>> DiG 9.9.5-3ubuntu0.3-Ubuntu <<>> @172.17.0.15 -p 8600 node1.node.consul
; (1 server found)
...
-data-dir is required
$ docker run -d --name node1 -h node1 progrium/consul -server -bootstrap-expect 3
replace with:
$ docker run -d --name node1 -h node1 gliderlabs/consul agent -server -bootstrap-expect 3 -data-dir /tmp/consul
$ docker run -d --name node2 -h node2 progrium/consul -server -join $JOIN_IP
replace with:
$ docker run --name node2 -h node2 gliderlabs/consul agent -server -data-dir /tmp/consul -join $JOIN_IP
$ docker run -d --name node3 -h node3 progrium/consul -server -join $JOIN_IP
replace with:
$ docker run --name node3 -h node3 gliderlabs/consul agent -server -data-dir /tmp/consul -join $JOIN_IP
$ docker run -d -p 8400:8400 -p 8500:8500 -p 8600:53/udp --name node4 -h node4 progrium/consul -join $JOIN_IP
replace with:
$ docker run -p 8500:8500 -p 8400:8400 -p 8600:53/udp --name node4 -h node4 gliderlabs/consul agent -client=0.0.0.0 -data-dir /tmp/consul -join $JOIN_IP
@mbsimonovic perhaps you've already seen this, but try running gliderlabs/consul-agent or gliderlabs/consul-server directly. They 'fix' up a few of your gotchas, such as -client=0.0.0.0, -data-dir, etc. the run command is a bit different in that all you now need is gliderlabs/consul-server -bootstrap
Yeah, still working some things out but it's about there. As soon as the images are done I'm starting on docs.
@mbsimonovic dns can work:
- add
--dns 172.17.42.1to your docker daemon or docker runtime (see below for some examples)
docker run --rm \
--name consul \
--net host \
--volume /mnt/consul:/data \
skippy/consul-server.dev -advertise EXTERNAL_IP -bootstrap
docker run --rm aanand/docker-dnsutils dig +short consul.service.consul
extra notes:
- I haven't figured out, yet, how to expose consul dns directly onto
127.0.0.1; i.e. I would like to be able to rundnsmasq -S /consul/127.0.0.1#8600 - my custom version is an extension of gliderlabs/consul-server, but I change the dns ports (and the docker file
EXPOSE PORTSback to 8600. I run dnsmasq so not all dns goes through consul. - I also change, in the docker file,
ENV DNS_PORT 8600, which is for gliderlabs/alpine - you need to have docker look to the bridge for dns... that is how the other docker containers can find consul without changing the host.
172.17.42.1is thedocker0bridge. - without setting the docker daemon with
--dns 172.17.42.1, the closest I could get wasdocker run --rm aanand/docker-dnsutils dig +short @${COREOS_PRIVATE_IPV4} consul.service.consul - my dnsmasq cmd:
docker run -d -p 53:53/tcp -p 53:53/udp --cap-add=NET_ADMIN andyshinn/dnsmasq -S /consul/${COREOS_PRIVATE_IPV4}#8600
I'm definitely a bit out of my depth with how docker does networking, virtual bridges, and what not, but I hope this info helps you out.