weave icon indicating copy to clipboard operation
weave copied to clipboard

Docker stack deploy services not accessible by name

Open jamshid opened this issue 5 years ago • 3 comments

What you expected to happen?

Tried a "voting app" docker-compose.yml sample https://github.com/dockersamples/example-voting-app. It deploys services to Docker Swarm manager and worker nodes.

I expected endpoints to work and behave the same whether deploying with weave or the built-in "overlay" network. I.e. if a service publishes container port 80 as 6001 then SERVICE-NAME:80 should be accessible from within the network and the outside world should be able to access the service at either the docker server manager or worker node's ip address on port 6001.

Unfortunately the ip address for the compose "service" name is not reaching a container's ports when "weave" is used.

I don't know if Docker Swarm or the Weave v2 Plugin is at fault.

Now that I'm filing this it seems https://github.com/weaveworks/weave/issues/3382 is the same issue? But maybe this example is helpful.

What happened?

Did docker stack deploy -c ./docker-compose.yml voteapp and found the container containing one of the services, the "result-app" service:

51cd972f5d24        gaiadocker/example-voting-app-result:latest      "node server.js"         6 minutes ago       Up 5 minutes              80/tcp                                                                             voteapp_result-app.1.ul21a6dyz1ghgjtlkry33m3jg

I exec into that docker exec -ti 51cd972f5d24 sh expecting to be able to curl the service name, but it fails:

root@7873f2393e8a:/app# curl http://result-app:80

curl: (7) Failed to connect to result-app port 80: No route to host
root@7873f2393e8a:/app# 
root@7873f2393e8a:/app# ping result-app
PING result-app (10.0.5.9): 56 data bytes
92 bytes from 7873f2393e8a (10.0.5.10): Destination Host Unreachable
92 bytes from 7873f2393e8a (10.0.5.10): Destination Host Unreachable
92 bytes from 7873f2393e8a (10.0.5.10): Destination Host Unreachable
^C--- result-app ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
root@7873f2393e8a:/app# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
41026: eth0@if41027: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:ff:01:75 brd ff:ff:ff:ff:ff:ff
    inet 10.255.1.117/16 brd 10.255.255.255 scope global eth0
       valid_lft forever preferred_lft forever
41028: eth1@if41029: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:12:00:07 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.7/16 brd 172.18.255.255 scope global eth1
       valid_lft forever preferred_lft forever
41030: ethwe0@if41031: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP group default 
    link/ether 16:23:5a:f1:a2:d9 brd ff:ff:ff:ff:ff:ff
    inet 10.0.5.10/24 brd 10.0.5.255 scope global ethwe0
       valid_lft forever preferred_lft forever
root@7873f2393e8a:/app# curl --head 10.0.5.10:80
HTTP/1.1 200 OK
X-Powered-By: Express

What is it about deploying to weave that makes the service name not publish the port on its ip?

It works fine when I deploy the compose file with the "overlay" network instead of weave:

root@311b7d60c7cd:/app# curl --head http://result-app:80
HTTP/1.1 200 OK
X-Powered-By: Express

root@311b7d60c7cd:/app# ping result-app
PING result-app (10.0.4.14): 56 data bytes

root@311b7d60c7cd:/app# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
17499: eth0@if17500: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:ff:01:6e brd ff:ff:ff:ff:ff:ff
    inet 10.255.1.110/16 brd 10.255.255.255 scope global eth0
       valid_lft forever preferred_lft forever
17501: eth2@if17502: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:15:00:06 brd ff:ff:ff:ff:ff:ff
    inet 172.21.0.6/16 brd 172.21.255.255 scope global eth2
       valid_lft forever preferred_lft forever
17503: eth1@if17504: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:00:04:0f brd ff:ff:ff:ff:ff:ff
    inet 10.0.4.15/24 brd 10.0.4.255 scope global eth1
       valid_lft forever preferred_lft forever

How to reproduce it?

Compare:

NETWORK_DRIVER=overlay docker stack deploy -c ./docker-compose.yml voteapp
NETWORK_DRIVER-weaveworks/net-plugin:latest_release docker stack deploy -c ./docker-compose.yml voteapp

This is the docker-compose.yml: https://gist.github.com/jamshid/bf5dcdb0ae1b505a636b33ca5ebfba4b

Anything else we need to know?

Deploying to a swarm manager + swarm worker cluster:

$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 20
  Running: 14
  Paused: 0
  Stopped: 6
 Images: 1337
 Server Version: 19.03.4
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay weaveworks/net-plugin:latest_release
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: X
  Is Manager: true
  ClusterID: X
  Managers: 1
  Nodes: 3
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 172.30.10.205
  Manager Addresses:
   172.30.10.205:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.4.0-166-generic
 Operating System: Ubuntu 16.04.6 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 31.34GiB
 Name: X
 ID: X
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
  provider=generic
 Experimental: false
 Insecure Registries:
  192.168.1.61:5000
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Versions:

sudo weave version
weave script 2.6.0
weave 2.6.0

# docker version
Client: Docker Engine - Community
 Version:           19.03.4
 API version:       1.40
 Go version:        go1.12.10
 Git commit:        9013bf583a
 Built:             Fri Oct 18 15:53:51 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.4
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.10
  Git commit:       9013bf583a
  Built:            Fri Oct 18 15:52:23 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

$ uname -a
Linux X 4.4.0-166-generic #195-Ubuntu SMP Tue Oct 1 09:35:25 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl version
#N/A, hope you still support Docker Swarm?

Logs:

# docker logs weave
Error: No such container: weave

$ journalctl -u docker.service --no-pager Nov 09 15:05:07 jimbo dockerd[1178]: time="2019-11-09T15:05:07.873921418-06:00" level=error msg="addLBBackend wuaj2lfwwhck02ia0ywznjn49/mynetwork: Unable to find load balancing endpoint for network wuaj2lfwwhck02ia0ywznjn49" ? -->

Network:

# ip route
default via 172.30.10.1 dev enp0s31f6 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown 
172.18.0.0/16 dev br-868f72a86652  proto kernel  scope link  src 172.18.0.1 
172.20.0.0/16 dev br-d5998c4fbd81  proto kernel  scope link  src 172.20.0.1 
172.21.0.0/16 dev docker_gwbridge  proto kernel  scope link  src 172.21.0.1 
172.27.0.0/16 dev br-ba650310e704  proto kernel  scope link  src 172.27.0.1 
172.30.10.0/23 dev enp0s31f6  proto kernel  scope link  src 172.30.10.205 

$ ip -4 -o addr
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
3: enp0s31f6    inet 172.30.10.205/23 brd 172.30.11.255 scope global enp0s31f6\       valid_lft forever preferred_lft forever
5: docker0    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\       valid_lft forever preferred_lft forever
6: docker_gwbridge    inet 172.21.0.1/16 brd 172.21.255.255 scope global docker_gwbridge\       valid_lft forever preferred_lft forever
7: br-868f72a86652    inet 172.18.0.1/16 brd 172.18.255.255 scope global br-868f72a86652\       valid_lft forever preferred_lft forever
8: br-ba650310e704    inet 172.27.0.1/16 brd 172.27.255.255 scope global br-ba650310e704\       valid_lft forever preferred_lft forever
9: br-d5998c4fbd81    inet 172.20.0.1/16 brd 172.20.255.255 scope global br-d5998c4fbd81\       valid_lft forever preferred_lft forever

$ sudo iptables-save

jamshid avatar Nov 09 '19 21:11 jamshid

Thanks for the detail; I do think it sounds similar to #3382. Does changing the DNS mode to round-robin help? Have you asked Docker whether this is supposed to work?

bboreham avatar Nov 11 '19 10:11 bboreham

Thanks, I updated the sample stack if you want to try:

https://gist.github.com/jamshid/bf5dcdb0ae1b505a636b33ca5ebfba4b

Unfortunately, no, endpoint_mode: dnsrr changes behavior but ports are still not published correctly.

The built-in "overlay" network does work. Shouldn't weave also expose container ports on their internal service name and publish ports on all Docker Swarm nodes?

Do you still support and test against Docker Swarm? I seem to remember this kind of thing working over a year ago, maybe recent versions broke?

Any suggestion or what to try? I should be able reproduce on a couple of digitalocean servers if that would help.

jamshid avatar Nov 12 '19 04:11 jamshid

The fact that a built-in feature works is hardly comparable. If it worked with a different plugin that would be more interesting.

The Docker network plugin interface deals with attaching a container and assigning an IP address. There is nothing about ports or publishing. https://github.com/docker/libnetwork/blob/master/docs/remote.md

My suggestion remains to take this question to Docker. People have said it works with older versions of Docker, so perhaps they changed something.

bboreham avatar Nov 12 '19 07:11 bboreham