swarmkit
swarmkit copied to clipboard
Docker swarm DNS periodically fails
I'm cross posting this here from a moby issue because I believe that nothing is going to happen with the issue in that project. The original issue is here:
https://github.com/moby/moby/issues/33721
In a 3-master swarm every few days one of the nodes will fail to have the internal swarm DNS resolve services by name. The swarm will operate fine for several days then DNS just stops working. I don't yet know if there is some specific change that we are making that causes the issue - our system does automated deploys and we haven't yet correlated an aspect of those automated deploys to when the issue is triggered.
Steps to reproduce the issue:
- Run a 3-master docker swarm in EC2 for several days
- Sometimes after 2-4 days one of the nodes can't resolve DNS names for services not running on it locally but running in the swarm.
- There is no step 3.
Describe the results you received: We run nginx inside our swarm as a reverse http proxy to our various services. We know that the DNS isn't working because we have disabled nginx DNS caching and suddenly it will stop resolving the IP of the service where our application is running. When I execute nslookup inside the containers running on the effected node they will fail to find any of the services running on other nodes in the swarm, but they will find services running on the same node.
Describe the results you expected: For name resolution to continue working.
Additional information you deem important (e.g. issue happens only occasionally): The issue only happens occasionally. It is resolved if I restart the docker daemon on the node that can no longer resolve DNS
Output of docker version
:
Client:
Version: 17.05.0-ce
API version: 1.29
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:10:54 2017
OS/Arch: linux/amd64
Server:
Version: 17.05.0-ce
API version: 1.29 (minimum version 1.12)
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:10:54 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
Containers: 19
Running: 12
Paused: 0
Stopped: 7
Images: 24
Server Version: 17.05.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 195
Dirperm1 Supported: true
Logging Driver: json-file
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: active
NodeID: 6mugebxyus7dgoip9i165mj64
Is Manager: true
ClusterID: mn6l9qnshdxzzxfoxwpsa18xe
Managers: 3
Nodes: 3
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Node Address: 10.0.46.77
Manager Addresses:
10.0.101.134:2377
10.0.109.151:2377
10.0.46.77:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-57-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.674GiB
Name: ip-10-0-46-77
ID: XMND:RWXY:RGVB:F4BK:TDA3:LQ3W:URQH:T6ES:HLQE:74A7:FQG4:RIYY
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: authentiseautomation
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.): AWS EC2 instances
Alright, looks like I captured some more information. At UTC 19:22 our automated systems started alerting that one of our services wasn't reachable. We had 3 nodes in the cluster at the time:
root@ip-10-0-101-134:/home/eliribble# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
6mugebxyus7dgoip9i165mj64 ip-10-0-46-77 Ready Active Reachable
oibvg6xvuq1otznkmjon706qs * ip-10-0-101-134 Ready Active Leader
qbio2455qa8aysx15p31aqdc8 ip-10-0-109-151 Ready Active Reachable
We run one instance of nginx on 6mugebxyus7dgoip9i165mj64
. That instance could not nslookup any of the services in node oibvg6xvuq1otznkmjon706qs
.
Log snippet that covers the last time the issue manifest itself plus a couple hours back from the node that lost connection with the other two nodes (oibvg6xvuq1otznkmjon706qs
)
Jun 20 17:01:35 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:01:35.774420552Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 17:01:35 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:01:35.841668869Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=wyatt message="The specified log stream already exists" origError=<nil>
Jun 20 17:01:40 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:01:40.903468555Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=wyatt message="The given sequenceToken is invalid. The next expected sequenceToken is: 49568806213822379210463152311192225551215377729798014642" origError=<nil>
Jun 20 17:02:23 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:02:23.051869551Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:36186"
Jun 20 17:08:41 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:08:41.974397903Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:54816->10.0.46.77:7946: i/o timeout"
Jun 20 17:55:31 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:55:31.974231064Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57498->10.0.109.151:7946: i/o timeout"
Jun 20 17:55:31 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:55:31.975018709Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 17:55:56 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:55:56.050109221Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57580->10.0.109.151:7946: i/o timeout"
Jun 20 17:55:56 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:55:56.055314323Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 17:55:57 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:55:57.955034271Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-109-151-2122b7c4e5ee)"
Jun 20 17:56:07 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:56:07.206138651Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:53562"
Jun 20 17:56:19 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:56:19.696755651Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57660->10.0.109.151:7946: i/o timeout"
Jun 20 17:56:26 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:56:26.081935803Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57678->10.0.109.151:7946: i/o timeout"
Jun 20 17:56:37 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:56:37.322456133Z" level=error msg="Bulk sync to node ip-10-0-46-77-d41057dd06df timed out"
Jun 20 17:56:37 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:56:37.557453445Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:53592"
Jun 20 17:57:18 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:57:18.060719072Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57782->10.0.109.151:7946: i/o timeout"
Jun 20 17:57:18 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:57:18.068538632Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 17:58:08 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:58:08.103343689Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57946->10.0.109.151:7946: i/o timeout"
Jun 20 17:58:21 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:58:21.974536703Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57970->10.0.109.151:7946: i/o timeout"
Jun 20 17:58:57 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:58:57.974643782Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:32820->10.0.46.77:7946: i/o timeout"
Jun 20 17:59:12 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:59:12.602359013Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:53738"
Jun 20 17:59:18 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:59:18.974339945Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:58122->10.0.109.151:7946: i/o timeout"
Jun 20 17:59:21 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:59:21.992436118Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:58134->10.0.109.151:7946: i/o timeout"
Jun 20 17:59:22 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:59:22.003878882Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 18:01:31 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:01:31.974633439Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:33082->10.0.46.77:7946: i/o timeout"
Jun 20 18:02:45 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:02:45.974523767Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:58412->10.0.109.151:7946: i/o timeout"
Jun 20 18:04:15 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:04:15.974362478Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:33290->10.0.46.77:7946: i/o timeout"
Jun 20 18:04:15 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:04:15.974905779Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 20 18:12:55 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:12:55.481951974Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:54328"
Jun 20 18:13:06 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:13:06.757386055Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:54338"
Jun 20 18:13:14 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:13:14.990316214Z" level=warning msg="memberlist: Was able to reach ip-10-0-109-151-2122b7c4e5ee via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP"
Jun 20 18:13:32 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:13:32.431579084Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:54346"
Jun 20 18:14:07 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:14:07.487006002Z" level=error msg="Bulk sync to node ip-10-0-46-77-d41057dd06df timed out"
Jun 20 18:14:59 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:14:59.055811063Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:59550->10.0.109.151:7946: i/o timeout"
Jun 20 18:14:59 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:14:59.060155187Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 18:15:04 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:15:04.033640190Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:59560->10.0.109.151:7946: i/o timeout"
Jun 20 18:15:04 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:15:04.038084464Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 18:15:13 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:15:13.113579180Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:59596->10.0.109.151:7946: i/o timeout"
Jun 20 18:15:13 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:15:13.114189926Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 18:15:13 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:15:13.916240646Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:54484"
Jun 20 18:15:33 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:15:33.774729884Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:52630"
Jun 20 18:17:54 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:17:54.974331890Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:34608->10.0.46.77:7946: i/o timeout"
Jun 20 18:17:54 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:17:54.975002973Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 20 19:13:20 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:13:20.092338987Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35204->10.0.109.151:7946: i/o timeout"
Jun 20 19:13:27 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:13:27.064704267Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35210->10.0.109.151:7946: i/o timeout"
Jun 20 19:13:41 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:13:41.243023427Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 20 19:14:02 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:14:02.125799690Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35304->10.0.109.151:7946: i/o timeout"
Jun 20 19:14:02 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:14:02.135869023Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 19:14:14 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:14:14.977870357Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-109-151-2122b7c4e5ee)"
Jun 20 19:14:35 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:14:35.595045654Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:55702"
Jun 20 19:14:51 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:14:51.057404889Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35428->10.0.109.151:7946: i/o timeout"
Jun 20 19:15:36 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:36.978327589Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35548->10.0.109.151:7946: i/o timeout"
Jun 20 19:15:36 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:36.978849052Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 19:15:41 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:41.974653384Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35570->10.0.109.151:7946: i/o timeout"
Jun 20 19:15:41 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:41.979074530Z" level=info msg="memberlist: Marking ip-10-0-109-151-2122b7c4e5ee as failed, suspect timeout reached"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.877403972Z" level=warning msg="Neighbor entry already present for IP 10.255.0.2, mac 02:42:0a:ff:00:02"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.877800921Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:02"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.878519186Z" level=warning msg="Neighbor entry already present for IP 10.0.9.23, mac 02:42:0a:00:09:17"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.879248091Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:17"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.879468683Z" level=warning msg="Neighbor entry already present for IP 10.0.9.65, mac 02:42:0a:00:09:41"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.879846799Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:41"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.880059701Z" level=warning msg="Neighbor entry already present for IP 10.0.9.11, mac 02:42:0a:00:09:0b"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.880255444Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:0b"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.880463126Z" level=warning msg="Neighbor entry already present for IP 10.0.9.13, mac 02:42:0a:00:09:0d"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.880671681Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:0d"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.881046636Z" level=warning msg="Neighbor entry already present for IP 10.0.9.54, mac 02:42:0a:00:09:36"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.881274334Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:36"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.881481222Z" level=warning msg="Neighbor entry already present for IP 10.0.9.21, mac 02:42:0a:00:09:15"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.881671859Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:15"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.882045893Z" level=warning msg="Neighbor entry already present for IP 10.0.9.3, mac 02:42:0a:00:09:03"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.882241964Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:03"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.882881274Z" level=warning msg="Neighbor entry already present for IP 10.0.9.33, mac 02:42:0a:00:09:21"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.883164138Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:21"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.883434419Z" level=warning msg="Neighbor entry already present for IP 10.0.9.41, mac 02:42:0a:00:09:29"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.883854358Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:29"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.884072362Z" level=warning msg="Neighbor entry already present for IP 10.0.9.61, mac 02:42:0a:00:09:3d"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.884702942Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:3d"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.885017910Z" level=warning msg="Neighbor entry already present for IP 10.0.9.31, mac 02:42:0a:00:09:1f"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.885418150Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:1f"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.897655936Z" level=warning msg="Neighbor entry already present for IP 10.0.9.5, mac 02:42:0a:00:09:05"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.897892315Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:05"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.898115851Z" level=warning msg="Neighbor entry already present for IP 10.0.9.25, mac 02:42:0a:00:09:19"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.898829811Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:19"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.899107248Z" level=warning msg="Neighbor entry already present for IP 10.0.9.27, mac 02:42:0a:00:09:1b"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.899312064Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:1b"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.879057617Z" level=warning msg="Neighbor entry already present for IP 10.255.0.4, mac 02:42:0a:ff:00:04"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.899680372Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:04"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.900465485Z" level=warning msg="Neighbor entry already present for IP 10.255.0.9, mac 02:42:0a:ff:00:09"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.900740901Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:09"
Jun 20 19:15:55 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:55.113234249Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35610->10.0.109.151:7946: i/o timeout"
Jun 20 19:15:55 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:55.113660921Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 19:16:17 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:16:17.974531987Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35630->10.0.109.151:7946: i/o timeout"
Jun 20 19:16:19 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:16:19.974569102Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35636->10.0.109.151:7946: i/o timeout"
Jun 20 19:16:21 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:16:21.453476300Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-109-151-2122b7c4e5ee)"
Jun 20 19:16:45 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:16:45.030140914Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:55866"
Jun 20 19:16:58 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:16:58.982126963Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35724->10.0.109.151:7946: i/o timeout"
Jun 20 19:21:12 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:21:12.974540021Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:39106->10.0.46.77:7946: i/o timeout"
Jun 20 19:21:12 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:21:12.975119223Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 20 19:22:15 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:22:15.974510246Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:39202->10.0.46.77:7946: i/o timeout"
Jun 20 19:22:36 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:22:36.979365648Z" level=error msg="Bulk sync to node ip-10-0-46-77-d41057dd06df timed out"
Here's more logs in case it helps:
Just had a repro again, this time couldn't reach services in qbio2455qa8aysx15p31aqdc8
. Log from the afflicted node:
Jun 20 22:02:04 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:04.559066162Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=pao message="The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973720211116500185719525833969250" origErro
r=<nil>
Jun 20 22:02:06 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:06.222961215Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=pao message="The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973722662818062364187698296524386" origErro
r=<nil>
Jun 20 22:02:06 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:06.223388414Z" level=error msg="InvalidSequenceTokenException: The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973722662818062364187698296524386\n\tstatus code: 400, request id: 18a69ee5-5604-11e7-ac02-9525e68b48b6"
Jun 20 22:02:24 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:24.575044678Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=pao message="The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973735672068807237213484260265570" origErro
r=<nil>
Jun 20 22:02:25 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:25.133842755Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=pao message="The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973738076622262450711050189803106" origErro
r=<nil>
Jun 20 22:02:25 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:25.134291150Z" level=error msg="InvalidSequenceTokenException: The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973738076622262450711050189803106\n\tstatus code: 400, request id: 2488c3eb-5604-11e7-87d8-f7e0f87a5a34"
Jun 20 22:02:28 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:28.421101828Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-46-77-d41057dd06df)"
Jun 20 22:02:29 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:29.151843539Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=pao message="The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973739875503882037279399591546466" origErro
r=<nil>
Jun 20 22:02:38 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:38.604526845Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:59942->10.0.101.134:7946: i/o timeout"
Jun 20 22:02:40 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:40.826010611Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:41794"
Jun 20 22:03:08 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:08.642452532Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:43580->10.0.46.77:7946: i/o timeout"
Jun 20 22:03:12 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:12.440892530Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:43592->10.0.46.77:7946: i/o timeout"
Jun 20 22:03:12 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:12.883628767Z" level=info msg="memberlist: Suspect ip-10-0-101-134-756771be03b7 has failed, no acks received"
Jun 20 22:03:15 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:15.325774759Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:41888"
Jun 20 22:03:16 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:16.305687187Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-109-151-2122b7c4e5ee)"
Jun 20 22:03:17 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:17.600976541Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:60008->10.0.101.134:7946: i/o timeout"
Jun 20 22:03:18 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:18.991481971Z" level=warning msg="Neighbor entry already present for IP 10.255.0.2, mac 02:42:0a:ff:00:02"
Jun 20 22:03:18 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:18.991875174Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:02"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.103155960Z" level=warning msg="Neighbor entry already present for IP 10.0.9.23, mac 02:42:0a:00:09:17"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.103523713Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:17"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.103793706Z" level=warning msg="Neighbor entry already present for IP 10.0.9.65, mac 02:42:0a:00:09:41"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.104035441Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:41"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.104232665Z" level=warning msg="Neighbor entry already present for IP 10.0.9.11, mac 02:42:0a:00:09:0b"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.104430469Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:0b"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.104624252Z" level=warning msg="Neighbor entry already present for IP 10.0.9.13, mac 02:42:0a:00:09:0d"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.104800697Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:0d"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.104990143Z" level=warning msg="Neighbor entry already present for IP 10.0.9.54, mac 02:42:0a:00:09:36"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.105168255Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:36"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.105399904Z" level=warning msg="Neighbor entry already present for IP 10.0.9.21, mac 02:42:0a:00:09:15"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.105575114Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:15"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.105763724Z" level=warning msg="Neighbor entry already present for IP 10.0.9.3, mac 02:42:0a:00:09:03"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.105960103Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:03"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.106149625Z" level=warning msg="Neighbor entry already present for IP 10.0.9.79, mac 02:42:0a:00:09:4f"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.106322933Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:4f"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.106515870Z" level=warning msg="Neighbor entry already present for IP 10.0.9.41, mac 02:42:0a:00:09:29"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.106694229Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:29"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.106877616Z" level=warning msg="Neighbor entry already present for IP 10.0.9.83, mac 02:42:0a:00:09:53"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107049891Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:53"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107232993Z" level=warning msg="Neighbor entry already present for IP 10.0.9.31, mac 02:42:0a:00:09:1f"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107405626Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:1f"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107436679Z" level=warning msg="Neighbor entry already present for IP 10.0.9.5, mac 02:42:0a:00:09:05"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107448591Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:05"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107471459Z" level=warning msg="Neighbor entry already present for IP 10.0.9.25, mac 02:42:0a:00:09:19"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107483378Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:19"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107501302Z" level=warning msg="Neighbor entry already present for IP 10.0.9.27, mac 02:42:0a:00:09:1b"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107512443Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:1b"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.108155363Z" level=warning msg="Neighbor entry already present for IP 10.255.0.4, mac 02:42:0a:ff:00:04"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.108171851Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:04"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.108195339Z" level=warning msg="Neighbor entry already present for IP 10.255.0.9, mac 02:42:0a:ff:00:09"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.108206981Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:09"
Jun 20 22:03:45 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:45.103599162Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:43612->10.0.46.77:7946: i/o timeout"
Jun 20 22:04:03 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:04:03.509329612Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-101-134-756771be03b7)"
Jun 20 22:04:34 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:04:34.335601868Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:43628->10.0.46.77:7946: i/o timeout"
Jun 20 22:04:39 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:04:39.274739359Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-101-134-756771be03b7)"
Jun 20 22:04:55 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:04:55.639069573Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.101.134:48992"
Jun 20 22:05:14 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:05:14.122649641Z" level=warning msg="underweighting node oibvg6xvuq1otznkmjon706qs for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=qbio2455qa8aysx15p31aqdc8
Jun 20 22:05:16 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:05:16.386633752Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-46-77-d41057dd06df)"
Jun 20 22:05:20 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:05:20.690589038Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:43692->10.0.46.77:7946: i/o timeout"
Jun 20 22:06:17 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:06:17.673781860Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:60174->10.0.101.134:7946: i/o timeout"
Jun 20 22:06:37 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:06:37.414486649Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.109.151:43780->10.0.46.77:7946: i/o timeout"
Jun 20 22:06:37 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:06:37.414932602Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 20 22:06:39 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:06:39.396701483Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.109.151:43786->10.0.46.77:7946: i/o timeout"
Jun 20 22:06:39 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:06:39.397157000Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 20 22:06:57 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:06:57.272894370Z" level=warning msg="underweighting node 6mugebxyus7dgoip9i165mj64 for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=qbio2455qa8aysx15p31aqdc8
Jun 20 22:07:02 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:02.496678208Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 22:07:02 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:02.534978262Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 20 22:07:11 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:11.423067335Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-101-134-756771be03b7)"
Jun 20 22:07:12 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:12.323754113Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153485069438749793080029744344342850" origError=<nil>
Jun 20 22:07:16 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:16.365827257Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=ssfr0k9u35quypmjqfkbl2tvi
Jun 20 22:07:22 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:22.360479044Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 22:07:22 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:22.398664805Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 20 22:07:28 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:28.296905888Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153485080656372473284175230832484674" origError=<nil>
Jun 20 22:07:29 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:29.445725638Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=wmppjl1bz3xcb4ss6hfmy3mrw
Jun 20 22:07:34 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:34.801838358Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 22:07:34 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:34.816546759Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 20 22:07:41 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:41.021118018Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153485099752564719916858705283391810" origError=<nil>
Jun 20 22:07:42 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:42.155442926Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=jj9l022hhkxdsbth9g2ufkxbx
Jun 20 22:07:48 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:48.081780601Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 22:07:48 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:48.121312621Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 20 22:08:00 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:00.399251676Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153485135170464457166650461283950914" origError=<nil>
Jun 20 22:08:02 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:02.413685915Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=vbw8q8ve3lx3hy1vg3nsobxp7
Jun 20 22:08:08 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:08.186250800Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 22:08:08 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:08.254983116Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 20 22:08:10 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:10.215973857Z" level=error msg="Failed to delete real server 10.0.9.83 for vip 10.0.9.78 fwmark 927 in sbox 2d7bd8f (6fcad2a): no such process"
Jun 20 22:08:10 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:10.216411066Z" level=error msg="Failed to delete service for vip 10.0.9.78 fwmark 927 in sbox 2d7bd8f (6fcad2a): no such process"
Jun 20 22:08:10 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:10Z" level=error msg="setting up rule failed, [-t mangle -D OUTPUT -d 10.0.9.78/32 -j MARK --set-mark 927]: (iptables failed: iptables --wait -t mangle -D OUTPUT -d 10.0.9.78/32 -j MARK --set-mark 927: iptables: No chain/target/match by that name.\n (exit status 1))"
Jun 20 22:08:10 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:10.441897743Z" level=error msg="Failed to delete firewall mark rule in sbox 2d7bd8f (6fcad2a): reexec failed: exit status 5"
Jun 20 22:08:17 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:17.440471912Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.109.151:43946->10.0.46.77:7946: i/o timeout"
Jun 20 22:08:20 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:20.341583110Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153485187573771960001982671662096706" origError=<nil>
Jun 20 22:08:23 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:23.711047057Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=9eaplbijiie196xchh05i09vd
Jun 20 22:08:23 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:23.833836964Z" level=warning msg="underweighting node qbio2455qa8aysx15p31aqdc8 for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=qbio2455qa8aysx15p31aqdc8
Jun 20 22:08:39 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:39.109031475Z" level=warning msg="underweighting node oibvg6xvuq1otznkmjon706qs for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=qbio2455qa8aysx15p31aqdc8
Jun 20 22:08:58 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:58.963890575Z" level=warning msg="underweighting node oibvg6xvuq1otznkmjon706qs for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=qbio2455qa8aysx15p31aqdc8
Jun 20 22:09:47 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:09:47.235114948Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 22:09:47 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:09:47.282337407Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 20 22:09:53 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:09:53.556344249Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153485304441839867967806560349987138" origError=<nil>
Jun 20 22:09:54 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:09:54.916094609Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=iem5dfv0dfpj5ss5mxr522wma
Jun 21 15:23:44 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:44.779739394Z" level=warning msg="Neighbor entry already present for IP 10.0.9.51, mac 02:42:0a:00:09:33"
Jun 21 15:23:44 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:44.779868159Z" level=warning msg="Neighbor entry already present for IP 10.0.101.134, mac 02:42:0a:00:09:33"
Jun 21 15:23:44 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:44.779926610Z" level=warning msg="Neighbor entry already present for IP 10.0.9.87, mac 02:42:0a:00:09:57"
Jun 21 15:23:44 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:44.779943038Z" level=warning msg="Neighbor entry already present for IP 10.0.101.134, mac 02:42:0a:00:09:57"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.294578851Z" level=warning msg="Neighbor entry already present for IP 10.255.0.2, mac 02:42:0a:ff:00:02"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.294984322Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:02"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.308336521Z" level=warning msg="Neighbor entry already present for IP 10.0.9.75, mac 02:42:0a:00:09:4b"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.308634950Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:4b"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.308850434Z" level=warning msg="Neighbor entry already present for IP 10.0.9.23, mac 02:42:0a:00:09:17"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.309034094Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:17"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.309228587Z" level=warning msg="Neighbor entry already present for IP 10.0.9.65, mac 02:42:0a:00:09:41"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.309435926Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:41"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.309651841Z" level=warning msg="Neighbor entry already present for IP 10.0.9.11, mac 02:42:0a:00:09:0b"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.309833449Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:0b"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.310063137Z" level=warning msg="Neighbor entry already present for IP 10.0.9.13, mac 02:42:0a:00:09:0d"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.310237092Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:0d"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.310419966Z" level=warning msg="Neighbor entry already present for IP 10.0.9.54, mac 02:42:0a:00:09:36"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.310589988Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:36"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.310782984Z" level=warning msg="Neighbor entry already present for IP 10.0.9.21, mac 02:42:0a:00:09:15"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.310960449Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:15"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.311151926Z" level=warning msg="Neighbor entry already present for IP 10.0.9.3, mac 02:42:0a:00:09:03"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.311325245Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:03"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.311602214Z" level=warning msg="Neighbor entry already present for IP 10.0.9.61, mac 02:42:0a:00:09:3d"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.311776341Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:3d"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.312011848Z" level=warning msg="Neighbor entry already present for IP 10.0.9.45, mac 02:42:0a:00:09:2d"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.312185377Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:2d"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.312381514Z" level=warning msg="Neighbor entry already present for IP 10.0.9.41, mac 02:42:0a:00:09:29"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.312562036Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:29"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.312841897Z" level=warning msg="Neighbor entry already present for IP 10.0.9.31, mac 02:42:0a:00:09:1f"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.313014885Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:1f"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.313333831Z" level=warning msg="Neighbor entry already present for IP 10.0.9.5, mac 02:42:0a:00:09:05"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.313510892Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:05"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.313759078Z" level=warning msg="Neighbor entry already present for IP 10.0.9.25, mac 02:42:0a:00:09:19"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.313935450Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:19"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.314123488Z" level=warning msg="Neighbor entry already present for IP 10.0.9.27, mac 02:42:0a:00:09:1b"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.314323007Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:1b"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.331360816Z" level=warning msg="Neighbor entry already present for IP 10.255.0.4, mac 02:42:0a:ff:00:04"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.331569737Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:04"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.331770960Z" level=warning msg="Neighbor entry already present for IP 10.255.0.9, mac 02:42:0a:ff:00:09"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.331950018Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:09"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.331770960Z" level=warning msg="Neighbor entry already present for IP 10.255.0.9, mac 02:42:0a:ff:00:09"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.331950018Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:09"
Jun 21 15:23:59 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:59.887137588Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153539940379039257764689584091832642" origError=<nil>
Jun 21 15:24:01 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:01.955016608Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=nke7voplt59qio9lvsyx948tp
Jun 21 15:24:12 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:12.907482444Z" level=error msg="Bulk sync to node ip-10-0-46-77-d41057dd06df timed out"
Jun 21 15:24:31 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:31.682132980Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:40154"
Jun 21 15:24:33 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:33.760149109Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:40160"
Jun 21 15:24:35 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:35.122557817Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:55370->10.0.46.77:7946: i/o timeout"
Jun 21 15:24:47 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:47.468712185Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:43580->10.0.101.134:7946: i/o timeout"
Jun 21 15:24:48 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:48.068181260Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:55404->10.0.46.77:7946: i/o timeout"
Jun 21 15:24:52 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:52.012558940Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 21 15:24:57 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:57.290037150Z" level=warning msg="memberlist: Was able to reach ip-10-0-46-77-d41057dd06df via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP"
Jun 21 15:25:08 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:25:08.536331175Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-101-134-756771be03b7)"
Jun 21 15:25:33 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:25:33.672998327Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 21 15:25:34 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:25:34.401032659Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 21 15:25:37 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:25:37.071571343Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:40276"
Jun 21 15:25:50 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:25:50.924207577Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153540015060431764458414410708296002" origError=<nil>
Jun 21 15:25:53 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:25:53.441072387Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=r5gy21gonjkqcmhzuqxd8hj83
Jun 21 15:26:03 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:26:03.451888779Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 21 15:26:04 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:26:04.278766560Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 21 15:26:16 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:26:16.683060139Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153540034872308096302959318657933634" origError=<nil>
Jun 21 15:26:22 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:26:22.617421905Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=cpkjgj4ggk3jpwl5540zyep6g
Jun 21 15:26:30 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:26:30.397323299Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:55472->10.0.46.77:7946: i/o timeout"
Jun 21 15:27:30 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:27:30.259837313Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 21 15:27:30 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:27:30.293792783Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 21 15:27:39 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:27:39.235800583Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153540093989989601277946706998075714" origError=<nil>
Jun 21 15:27:42 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:27:42.039514298Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=7wj0pq5um2rvkjg1jk58edq47
Jun 21 15:28:33 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:28:33.847638535Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.109.151:55626->10.0.46.77:7946: i/o timeout"
Jun 21 15:28:42 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:28:42.544204934Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 21 15:28:42 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:28:42.602892470Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 21 15:28:49 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:28:49.467118327Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153540148764000636377570233668346178" origError=<nil>
Jun 21 15:28:50 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:28:50.635187693Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=5fixv9o7o9es78wotja765stm
My guess at this point is that the issue has to do with the line:
Jun 21 15:23:42 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:42.160320276Z" level=error msg="Bulk sync to node ip-10-0-101-134-756771be03b7 timed out"
as that seems to immediately precede the problem and isn't related to simple logging issues. I'm working on understanding better what that message means
Alright, I started looking at the logs on the other nodes to see if there was anything obvious in them. This seems suspcious - this is from the system that runs nginx and needs the DNS entries for the other nodes to work to find the service it is reverse-proxying
Jun 21 15:23:08 ip-10-0-46-77 dockerd[974]: time="2017-06-21T15:23:08.965737168Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=6mugebxyus7dgoip9i165mj64 service.id=rl90o2zc17b14r294p0a4dr7v task.id=ph3392o6cmjkta7hjlvtanh27
Jun 21 15:23:09 ip-10-0-46-77 dockerd[974]: time="2017-06-21T15:23:09.487271084Z" level=warning msg="underweighting node 6mugebxyus7dgoip9i165mj64 for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=6mugebxyus7dgoip9i165mj64
Jun 21 15:23:43 ip-10-0-46-77 dockerd[974]: time="2017-06-21T15:23:43.743729353Z" level=info msg="memberlist: Marking ip-10-0-109-151-4c957e3ec95e as failed, suspect timeout reached"
Can you do a docker service ps rl90o2zc17b14r294p0a4dr7v --no-trunc
?
Or if that no longer exists, do it for whatever service is named during when this message appears again:
Jun 21 15:23:09 ip-10-0-46-77 dockerd[974]: time="2017-06-21T15:23:09.487271084Z" level=warning msg="underweighting node 6mugebxyus7dgoip9i165mj64 for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=6mugebxyus7dgoip9i165mj64
root@ip-10-0-46-77:/home/eliribble# docker service ps rl90o2zc17b14r294p0a4dr7v --no-trunc
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
1z516vsiax1j81gmr0vb3y6av maxillo.1 authentise/maxillo:latest@sha256:a0b50d597dbc1a27b3c418b4066e9099ca4c22eb550c1985ad7999bed87db9ac ip-10-0-101-134 Running Starting 7 seconds ago
uvtomhf2kq43b8zwhn1j6p3jt \_ maxillo.1 authentise/maxillo:latest@sha256:a0b50d597dbc1a27b3c418b4066e9099ca4c22eb550c1985ad7999bed87db9ac ip-10-0-101-134 Shutdown Failed 12 seconds ago "task: non-zero exit (1)"
ucp0f67avjjgc5wnp0au4w959 \_ maxillo.1 authentise/maxillo:latest@sha256:a0b50d597dbc1a27b3c418b4066e9099ca4c22eb550c1985ad7999bed87db9ac ip-10-0-109-151 Shutdown Failed 32 seconds ago "task: non-zero exit (1)"
d28gd05xnc3dhb6z8iqhmwkgz \_ maxillo.1 authentise/maxillo:latest@sha256:a0b50d597dbc1a27b3c418b4066e9099ca4c22eb550c1985ad7999bed87db9ac ip-10-0-101-134 Shutdown Failed 42 seconds ago "task: non-zero exit (1)"
0bbke9xkomhfwtoh2su8zy0n0 \_ maxillo.1 authentise/maxillo:latest@sha256:a0b50d597dbc1a27b3c418b4066e9099ca4c22eb550c1985ad7999bed87db9ac ip-10-0-46-77 Shutdown Failed about a minute ago "task: non-zero exit (1)"
I hadn't noticed before that maxillo was failing to start. That's been resolved now. I believe it was a red-herring, we've been having the problem for days and the maxillo failure should be only about 24 hours old
@EliRibble, glad that helped, but I agree that there is something more wrong. I'm seeing some similar messages in one of my 17.05 clusters and I'm stuck as well. I'll be following the thread and hope it gets ironed out soon. Not sure if posting my logs will help, but I'll offer any information I can if it helps get to the bottom of the issue.
Based on this discussion thread
https://groups.google.com/forum/#!topic/consul-tool/dQSHf2R93lI
I'm starting to wonder if the issue has to do with periodic UDP packet failure so that one node is failing to indicate it is still alive, being presumed dead and then coming back. I'm basing this off of the log line:
Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
and
Bulk sync to node ip-10-0-46-77-d41057dd06df timed out
ping @sanimej
I don't know if I'm having the same issue; but something very similar.
I went to go deploy a second environment, everything the exact same as my first, just in a different AWS region. (Everything is scripted - should be the exact same, minus IDs for newly created AWS resources)
I also have an NGINX reverse proxy, that was originally looking for the service name; however it wasn't able to resolve that, whether via NGINX or just an nslookup
. However I was able to resolve it if I prefixed my stack name in front of the service name, such as: stack_servicename
both when, and when the bug did not present itself.
I think I'm running into this issue as well, with Docker for AWS 17.06.0-ce. In my case, the service in question is Prometheus, and about 3:45am local time this morning I started receiving messages like:
{"level":"warning","msg":"DNS resolution failed.","name":"tasks.prometheus-local","reason":"dial udp 127.0.0.11:53: i/o timeout","server":"127.0.0.11","source":"dns.go:190","time":"2017-07-17T07:43:47Z"}
(note: the service doing the logs is named prometheus-local
, so it's failing to look its own DNS record up)
It was happening sporadically, and would recover within 2-3mins. Then immediately after the last outage the task died (with error non-zero exit (137)
) and I haven't seen any DNS resolution issues with the task started up to replace it.
Both of these tasks were scheduled on the same host, and other tasks on that host seem not to be having DNS issues.
We have the exact same problem excepts it happens every 24~48 hours.
This only started happening when we migrated from AWS to Azure.
docker version
<pre>Client: Docker Engine - Community
Version: 20.10.6
API version: 1.41
Go version: go1.13.15
Git commit: 370c289
Built: Fri Apr 9 22:46:01 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.6
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 8728dd2
Built: Fri Apr 9 22:44:13 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.4
GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
runc:
Version: 1.0.0-rc93
GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
docker-init:
Version: 0.19.0
GitCommit: de40ad0
</pre>
docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
scan: Docker Scan (Docker Inc., v0.7.0)
Server:
Containers: 39
Running: 26
Paused: 0
Stopped: 13
Images: 157
Server Version: 20.10.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: false
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: zcp2sjprerde3f71bwyrjxcmf
Is Manager: true
ClusterID: tw60ziploqgfjqr8dm7vzwizx
Managers: 1
Nodes: 3
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 20.188.56.247
Manager Addresses:
20.188.56.247:2377
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.4.0-1046-azure
Operating System: Ubuntu 18.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.82GiB
Name: swarm-master
ID: HPSR:MEJD:CYQT:FFHZ:RMKF:NPLP:PL77:PZUY:OV6M:V3HB:PH6D:VSQG
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Is there a way to debug this behavior?
I thought it had something to do with Azure Vnets. I configured docker to use public IP address and the problem persists