swarmkit icon indicating copy to clipboard operation
swarmkit copied to clipboard

Docker swarm DNS periodically fails

Open EliRibble opened this issue 7 years ago • 13 comments

I'm cross posting this here from a moby issue because I believe that nothing is going to happen with the issue in that project. The original issue is here:

https://github.com/moby/moby/issues/33721

In a 3-master swarm every few days one of the nodes will fail to have the internal swarm DNS resolve services by name. The swarm will operate fine for several days then DNS just stops working. I don't yet know if there is some specific change that we are making that causes the issue - our system does automated deploys and we haven't yet correlated an aspect of those automated deploys to when the issue is triggered.

Steps to reproduce the issue:

  1. Run a 3-master docker swarm in EC2 for several days
  2. Sometimes after 2-4 days one of the nodes can't resolve DNS names for services not running on it locally but running in the swarm.
  3. There is no step 3.

Describe the results you received: We run nginx inside our swarm as a reverse http proxy to our various services. We know that the DNS isn't working because we have disabled nginx DNS caching and suddenly it will stop resolving the IP of the service where our application is running. When I execute nslookup inside the containers running on the effected node they will fail to find any of the services running on other nodes in the swarm, but they will find services running on the same node.

Describe the results you expected: For name resolution to continue working.

Additional information you deem important (e.g. issue happens only occasionally): The issue only happens occasionally. It is resolved if I restart the docker daemon on the node that can no longer resolve DNS

Output of docker version:

Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:10:54 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:10:54 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 19
 Running: 12
 Paused: 0
 Stopped: 7
Images: 24
Server Version: 17.05.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 195
 Dirperm1 Supported: true
Logging Driver: json-file
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: 6mugebxyus7dgoip9i165mj64
 Is Manager: true
 ClusterID: mn6l9qnshdxzzxfoxwpsa18xe
 Managers: 3
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.0.46.77
 Manager Addresses:
  10.0.101.134:2377
  10.0.109.151:2377
  10.0.46.77:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-57-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.674GiB
Name: ip-10-0-46-77
ID: XMND:RWXY:RGVB:F4BK:TDA3:LQ3W:URQH:T6ES:HLQE:74A7:FQG4:RIYY
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: authentiseautomation
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.): AWS EC2 instances

EliRibble avatar Jun 21 '17 15:06 EliRibble

Alright, looks like I captured some more information. At UTC 19:22 our automated systems started alerting that one of our services wasn't reachable. We had 3 nodes in the cluster at the time:

root@ip-10-0-101-134:/home/eliribble# docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS
6mugebxyus7dgoip9i165mj64     ip-10-0-46-77       Ready               Active              Reachable
oibvg6xvuq1otznkmjon706qs *   ip-10-0-101-134     Ready               Active              Leader
qbio2455qa8aysx15p31aqdc8     ip-10-0-109-151     Ready               Active              Reachable

We run one instance of nginx on 6mugebxyus7dgoip9i165mj64. That instance could not nslookup any of the services in node oibvg6xvuq1otznkmjon706qs. Log snippet that covers the last time the issue manifest itself plus a couple hours back from the node that lost connection with the other two nodes (oibvg6xvuq1otznkmjon706qs)

Jun 20 17:01:35 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:01:35.774420552Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 17:01:35 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:01:35.841668869Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=wyatt message="The specified log stream already exists" origError=<nil>
Jun 20 17:01:40 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:01:40.903468555Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=wyatt message="The given sequenceToken is invalid. The next expected sequenceToken is: 49568806213822379210463152311192225551215377729798014642" origError=<nil>
Jun 20 17:02:23 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:02:23.051869551Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:36186"
Jun 20 17:08:41 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:08:41.974397903Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:54816->10.0.46.77:7946: i/o timeout"
Jun 20 17:55:31 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:55:31.974231064Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57498->10.0.109.151:7946: i/o timeout"
Jun 20 17:55:31 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:55:31.975018709Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 17:55:56 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:55:56.050109221Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57580->10.0.109.151:7946: i/o timeout"
Jun 20 17:55:56 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:55:56.055314323Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 17:55:57 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:55:57.955034271Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-109-151-2122b7c4e5ee)"
Jun 20 17:56:07 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:56:07.206138651Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:53562"
Jun 20 17:56:19 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:56:19.696755651Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57660->10.0.109.151:7946: i/o timeout"
Jun 20 17:56:26 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:56:26.081935803Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57678->10.0.109.151:7946: i/o timeout"
Jun 20 17:56:37 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:56:37.322456133Z" level=error msg="Bulk sync to node ip-10-0-46-77-d41057dd06df timed out"
Jun 20 17:56:37 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:56:37.557453445Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:53592"
Jun 20 17:57:18 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:57:18.060719072Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57782->10.0.109.151:7946: i/o timeout"
Jun 20 17:57:18 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:57:18.068538632Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 17:58:08 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:58:08.103343689Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57946->10.0.109.151:7946: i/o timeout"
Jun 20 17:58:21 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:58:21.974536703Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:57970->10.0.109.151:7946: i/o timeout"
Jun 20 17:58:57 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:58:57.974643782Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:32820->10.0.46.77:7946: i/o timeout"
Jun 20 17:59:12 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:59:12.602359013Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:53738"
Jun 20 17:59:18 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:59:18.974339945Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:58122->10.0.109.151:7946: i/o timeout"
Jun 20 17:59:21 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:59:21.992436118Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:58134->10.0.109.151:7946: i/o timeout"
Jun 20 17:59:22 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T17:59:22.003878882Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 18:01:31 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:01:31.974633439Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:33082->10.0.46.77:7946: i/o timeout"
Jun 20 18:02:45 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:02:45.974523767Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:58412->10.0.109.151:7946: i/o timeout"
Jun 20 18:04:15 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:04:15.974362478Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:33290->10.0.46.77:7946: i/o timeout"
Jun 20 18:04:15 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:04:15.974905779Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 20 18:12:55 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:12:55.481951974Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:54328"
Jun 20 18:13:06 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:13:06.757386055Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:54338"
Jun 20 18:13:14 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:13:14.990316214Z" level=warning msg="memberlist: Was able to reach ip-10-0-109-151-2122b7c4e5ee via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP"
Jun 20 18:13:32 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:13:32.431579084Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:54346"
Jun 20 18:14:07 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:14:07.487006002Z" level=error msg="Bulk sync to node ip-10-0-46-77-d41057dd06df timed out"
Jun 20 18:14:59 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:14:59.055811063Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:59550->10.0.109.151:7946: i/o timeout"
Jun 20 18:14:59 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:14:59.060155187Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 18:15:04 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:15:04.033640190Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:59560->10.0.109.151:7946: i/o timeout"
Jun 20 18:15:04 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:15:04.038084464Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 18:15:13 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:15:13.113579180Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:59596->10.0.109.151:7946: i/o timeout"
Jun 20 18:15:13 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:15:13.114189926Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 18:15:13 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:15:13.916240646Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:54484"
Jun 20 18:15:33 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:15:33.774729884Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:52630"
Jun 20 18:17:54 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:17:54.974331890Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:34608->10.0.46.77:7946: i/o timeout"
Jun 20 18:17:54 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T18:17:54.975002973Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 20 19:13:20 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:13:20.092338987Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35204->10.0.109.151:7946: i/o timeout"
Jun 20 19:13:27 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:13:27.064704267Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35210->10.0.109.151:7946: i/o timeout"
Jun 20 19:13:41 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:13:41.243023427Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 20 19:14:02 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:14:02.125799690Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35304->10.0.109.151:7946: i/o timeout"
Jun 20 19:14:02 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:14:02.135869023Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 19:14:14 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:14:14.977870357Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-109-151-2122b7c4e5ee)"
Jun 20 19:14:35 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:14:35.595045654Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:55702"
Jun 20 19:14:51 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:14:51.057404889Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35428->10.0.109.151:7946: i/o timeout"
Jun 20 19:15:36 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:36.978327589Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35548->10.0.109.151:7946: i/o timeout"
Jun 20 19:15:36 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:36.978849052Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 19:15:41 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:41.974653384Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35570->10.0.109.151:7946: i/o timeout"
Jun 20 19:15:41 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:41.979074530Z" level=info msg="memberlist: Marking ip-10-0-109-151-2122b7c4e5ee as failed, suspect timeout reached"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.877403972Z" level=warning msg="Neighbor entry already present for IP 10.255.0.2, mac 02:42:0a:ff:00:02"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.877800921Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:02"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.878519186Z" level=warning msg="Neighbor entry already present for IP 10.0.9.23, mac 02:42:0a:00:09:17"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.879248091Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:17"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.879468683Z" level=warning msg="Neighbor entry already present for IP 10.0.9.65, mac 02:42:0a:00:09:41"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.879846799Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:41"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.880059701Z" level=warning msg="Neighbor entry already present for IP 10.0.9.11, mac 02:42:0a:00:09:0b"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.880255444Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:0b"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.880463126Z" level=warning msg="Neighbor entry already present for IP 10.0.9.13, mac 02:42:0a:00:09:0d"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.880671681Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:0d"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.881046636Z" level=warning msg="Neighbor entry already present for IP 10.0.9.54, mac 02:42:0a:00:09:36"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.881274334Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:36"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.881481222Z" level=warning msg="Neighbor entry already present for IP 10.0.9.21, mac 02:42:0a:00:09:15"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.881671859Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:15"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.882045893Z" level=warning msg="Neighbor entry already present for IP 10.0.9.3, mac 02:42:0a:00:09:03"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.882241964Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:03"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.882881274Z" level=warning msg="Neighbor entry already present for IP 10.0.9.33, mac 02:42:0a:00:09:21"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.883164138Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:21"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.883434419Z" level=warning msg="Neighbor entry already present for IP 10.0.9.41, mac 02:42:0a:00:09:29"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.883854358Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:29"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.884072362Z" level=warning msg="Neighbor entry already present for IP 10.0.9.61, mac 02:42:0a:00:09:3d"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.884702942Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:3d"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.885017910Z" level=warning msg="Neighbor entry already present for IP 10.0.9.31, mac 02:42:0a:00:09:1f"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.885418150Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:1f"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.897655936Z" level=warning msg="Neighbor entry already present for IP 10.0.9.5, mac 02:42:0a:00:09:05"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.897892315Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:05"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.898115851Z" level=warning msg="Neighbor entry already present for IP 10.0.9.25, mac 02:42:0a:00:09:19"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.898829811Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:19"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.899107248Z" level=warning msg="Neighbor entry already present for IP 10.0.9.27, mac 02:42:0a:00:09:1b"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.899312064Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:1b"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.879057617Z" level=warning msg="Neighbor entry already present for IP 10.255.0.4, mac 02:42:0a:ff:00:04"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.899680372Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:04"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.900465485Z" level=warning msg="Neighbor entry already present for IP 10.255.0.9, mac 02:42:0a:ff:00:09"
Jun 20 19:15:43 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:43.900740901Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:09"
Jun 20 19:15:55 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:55.113234249Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35610->10.0.109.151:7946: i/o timeout"
Jun 20 19:15:55 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:15:55.113660921Z" level=info msg="memberlist: Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"
Jun 20 19:16:17 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:16:17.974531987Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35630->10.0.109.151:7946: i/o timeout"
Jun 20 19:16:19 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:16:19.974569102Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35636->10.0.109.151:7946: i/o timeout"
Jun 20 19:16:21 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:16:21.453476300Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-109-151-2122b7c4e5ee)"
Jun 20 19:16:45 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:16:45.030140914Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.109.151:55866"
Jun 20 19:16:58 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:16:58.982126963Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:35724->10.0.109.151:7946: i/o timeout"
Jun 20 19:21:12 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:21:12.974540021Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:39106->10.0.46.77:7946: i/o timeout"
Jun 20 19:21:12 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:21:12.975119223Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 20 19:22:15 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:22:15.974510246Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.101.134:39202->10.0.46.77:7946: i/o timeout"
Jun 20 19:22:36 ip-10-0-101-134 dockerd[10882]: time="2017-06-20T19:22:36.979365648Z" level=error msg="Bulk sync to node ip-10-0-46-77-d41057dd06df timed out"

EliRibble avatar Jun 21 '17 15:06 EliRibble

Here's more logs in case it helps:

Just had a repro again, this time couldn't reach services in qbio2455qa8aysx15p31aqdc8 . Log from the afflicted node:

Jun 20 22:02:04 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:04.559066162Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=pao message="The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973720211116500185719525833969250" origErro
r=<nil>
Jun 20 22:02:06 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:06.222961215Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=pao message="The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973722662818062364187698296524386" origErro
r=<nil>
Jun 20 22:02:06 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:06.223388414Z" level=error msg="InvalidSequenceTokenException: The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973722662818062364187698296524386\n\tstatus code: 400, request id: 18a69ee5-5604-11e7-ac02-9525e68b48b6"
Jun 20 22:02:24 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:24.575044678Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=pao message="The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973735672068807237213484260265570" origErro
r=<nil>
Jun 20 22:02:25 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:25.133842755Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=pao message="The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973738076622262450711050189803106" origErro
r=<nil>
Jun 20 22:02:25 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:25.134291150Z" level=error msg="InvalidSequenceTokenException: The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973738076622262450711050189803106\n\tstatus code: 400, request id: 2488c3eb-5604-11e7-87d8-f7e0f87a5a34"
Jun 20 22:02:28 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:28.421101828Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-46-77-d41057dd06df)"
Jun 20 22:02:29 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:29.151843539Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=pao message="The given sequenceToken is invalid. The next expected sequenceToken is: 49556647666881803461441973739875503882037279399591546466" origErro
r=<nil>
Jun 20 22:02:38 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:38.604526845Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:59942->10.0.101.134:7946: i/o timeout"
Jun 20 22:02:40 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:02:40.826010611Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:41794"
Jun 20 22:03:08 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:08.642452532Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:43580->10.0.46.77:7946: i/o timeout"
Jun 20 22:03:12 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:12.440892530Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:43592->10.0.46.77:7946: i/o timeout"
Jun 20 22:03:12 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:12.883628767Z" level=info msg="memberlist: Suspect ip-10-0-101-134-756771be03b7 has failed, no acks received"
Jun 20 22:03:15 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:15.325774759Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:41888"
Jun 20 22:03:16 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:16.305687187Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-109-151-2122b7c4e5ee)"
Jun 20 22:03:17 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:17.600976541Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:60008->10.0.101.134:7946: i/o timeout"
Jun 20 22:03:18 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:18.991481971Z" level=warning msg="Neighbor entry already present for IP 10.255.0.2, mac 02:42:0a:ff:00:02"
Jun 20 22:03:18 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:18.991875174Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:02"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.103155960Z" level=warning msg="Neighbor entry already present for IP 10.0.9.23, mac 02:42:0a:00:09:17"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.103523713Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:17"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.103793706Z" level=warning msg="Neighbor entry already present for IP 10.0.9.65, mac 02:42:0a:00:09:41"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.104035441Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:41"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.104232665Z" level=warning msg="Neighbor entry already present for IP 10.0.9.11, mac 02:42:0a:00:09:0b"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.104430469Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:0b"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.104624252Z" level=warning msg="Neighbor entry already present for IP 10.0.9.13, mac 02:42:0a:00:09:0d"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.104800697Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:0d"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.104990143Z" level=warning msg="Neighbor entry already present for IP 10.0.9.54, mac 02:42:0a:00:09:36"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.105168255Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:36"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.105399904Z" level=warning msg="Neighbor entry already present for IP 10.0.9.21, mac 02:42:0a:00:09:15"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.105575114Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:15"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.105763724Z" level=warning msg="Neighbor entry already present for IP 10.0.9.3, mac 02:42:0a:00:09:03"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.105960103Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:03"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.106149625Z" level=warning msg="Neighbor entry already present for IP 10.0.9.79, mac 02:42:0a:00:09:4f"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.106322933Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:4f"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.106515870Z" level=warning msg="Neighbor entry already present for IP 10.0.9.41, mac 02:42:0a:00:09:29"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.106694229Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:29"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.106877616Z" level=warning msg="Neighbor entry already present for IP 10.0.9.83, mac 02:42:0a:00:09:53"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107049891Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:53"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107232993Z" level=warning msg="Neighbor entry already present for IP 10.0.9.31, mac 02:42:0a:00:09:1f"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107405626Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:1f"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107436679Z" level=warning msg="Neighbor entry already present for IP 10.0.9.5, mac 02:42:0a:00:09:05"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107448591Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:05"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107471459Z" level=warning msg="Neighbor entry already present for IP 10.0.9.25, mac 02:42:0a:00:09:19"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107483378Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:19"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107501302Z" level=warning msg="Neighbor entry already present for IP 10.0.9.27, mac 02:42:0a:00:09:1b"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.107512443Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:1b"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.108155363Z" level=warning msg="Neighbor entry already present for IP 10.255.0.4, mac 02:42:0a:ff:00:04"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.108171851Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:04"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.108195339Z" level=warning msg="Neighbor entry already present for IP 10.255.0.9, mac 02:42:0a:ff:00:09"
Jun 20 22:03:19 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:19.108206981Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:09"
Jun 20 22:03:45 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:03:45.103599162Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:43612->10.0.46.77:7946: i/o timeout"
Jun 20 22:04:03 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:04:03.509329612Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-101-134-756771be03b7)"
Jun 20 22:04:34 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:04:34.335601868Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:43628->10.0.46.77:7946: i/o timeout"
Jun 20 22:04:39 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:04:39.274739359Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-101-134-756771be03b7)"
Jun 20 22:04:55 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:04:55.639069573Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.101.134:48992"
Jun 20 22:05:14 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:05:14.122649641Z" level=warning msg="underweighting node oibvg6xvuq1otznkmjon706qs for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=qbio2455qa8aysx15p31aqdc8
Jun 20 22:05:16 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:05:16.386633752Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-46-77-d41057dd06df)"
Jun 20 22:05:20 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:05:20.690589038Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:43692->10.0.46.77:7946: i/o timeout"
Jun 20 22:06:17 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:06:17.673781860Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:60174->10.0.101.134:7946: i/o timeout"
Jun 20 22:06:37 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:06:37.414486649Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.109.151:43780->10.0.46.77:7946: i/o timeout"
Jun 20 22:06:37 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:06:37.414932602Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 20 22:06:39 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:06:39.396701483Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.109.151:43786->10.0.46.77:7946: i/o timeout"
Jun 20 22:06:39 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:06:39.397157000Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 20 22:06:57 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:06:57.272894370Z" level=warning msg="underweighting node 6mugebxyus7dgoip9i165mj64 for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=qbio2455qa8aysx15p31aqdc8
Jun 20 22:07:02 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:02.496678208Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 22:07:02 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:02.534978262Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 20 22:07:11 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:11.423067335Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-101-134-756771be03b7)"
Jun 20 22:07:12 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:12.323754113Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153485069438749793080029744344342850" origError=<nil>
Jun 20 22:07:16 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:16.365827257Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=ssfr0k9u35quypmjqfkbl2tvi
Jun 20 22:07:22 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:22.360479044Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 22:07:22 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:22.398664805Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 20 22:07:28 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:28.296905888Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153485080656372473284175230832484674" origError=<nil>
Jun 20 22:07:29 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:29.445725638Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=wmppjl1bz3xcb4ss6hfmy3mrw
Jun 20 22:07:34 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:34.801838358Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 22:07:34 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:34.816546759Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 20 22:07:41 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:41.021118018Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153485099752564719916858705283391810" origError=<nil>
Jun 20 22:07:42 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:42.155442926Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=jj9l022hhkxdsbth9g2ufkxbx
Jun 20 22:07:48 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:48.081780601Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 22:07:48 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:07:48.121312621Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 20 22:08:00 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:00.399251676Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153485135170464457166650461283950914" origError=<nil>
Jun 20 22:08:02 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:02.413685915Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=vbw8q8ve3lx3hy1vg3nsobxp7
Jun 20 22:08:08 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:08.186250800Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 22:08:08 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:08.254983116Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 20 22:08:10 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:10.215973857Z" level=error msg="Failed to delete real server 10.0.9.83 for vip 10.0.9.78 fwmark 927 in sbox 2d7bd8f (6fcad2a): no such process"
Jun 20 22:08:10 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:10.216411066Z" level=error msg="Failed to delete service for vip 10.0.9.78 fwmark 927 in sbox 2d7bd8f (6fcad2a): no such process"
Jun 20 22:08:10 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:10Z" level=error msg="setting up rule failed, [-t mangle -D OUTPUT -d 10.0.9.78/32 -j MARK --set-mark 927]:  (iptables failed: iptables --wait -t mangle -D OUTPUT -d 10.0.9.78/32 -j MARK --set-mark 927: iptables: No chain/target/match by that name.\n (exit status 1))"
Jun 20 22:08:10 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:10.441897743Z" level=error msg="Failed to delete firewall mark rule in sbox 2d7bd8f (6fcad2a): reexec failed: exit status 5"
Jun 20 22:08:17 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:17.440471912Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.109.151:43946->10.0.46.77:7946: i/o timeout"
Jun 20 22:08:20 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:20.341583110Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153485187573771960001982671662096706" origError=<nil>
Jun 20 22:08:23 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:23.711047057Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=9eaplbijiie196xchh05i09vd
Jun 20 22:08:23 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:23.833836964Z" level=warning msg="underweighting node qbio2455qa8aysx15p31aqdc8 for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=qbio2455qa8aysx15p31aqdc8
Jun 20 22:08:39 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:39.109031475Z" level=warning msg="underweighting node oibvg6xvuq1otznkmjon706qs for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=qbio2455qa8aysx15p31aqdc8
Jun 20 22:08:58 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:08:58.963890575Z" level=warning msg="underweighting node oibvg6xvuq1otznkmjon706qs for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=qbio2455qa8aysx15p31aqdc8
Jun 20 22:09:47 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:09:47.235114948Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 20 22:09:47 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:09:47.282337407Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 20 22:09:53 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:09:53.556344249Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153485304441839867967806560349987138" origError=<nil>
Jun 20 22:09:54 ip-10-0-109-151 dockerd[10449]: time="2017-06-20T22:09:54.916094609Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=iem5dfv0dfpj5ss5mxr522wma
Jun 21 15:23:44 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:44.779739394Z" level=warning msg="Neighbor entry already present for IP 10.0.9.51, mac 02:42:0a:00:09:33"
Jun 21 15:23:44 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:44.779868159Z" level=warning msg="Neighbor entry already present for IP 10.0.101.134, mac 02:42:0a:00:09:33"
Jun 21 15:23:44 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:44.779926610Z" level=warning msg="Neighbor entry already present for IP 10.0.9.87, mac 02:42:0a:00:09:57"
Jun 21 15:23:44 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:44.779943038Z" level=warning msg="Neighbor entry already present for IP 10.0.101.134, mac 02:42:0a:00:09:57"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.294578851Z" level=warning msg="Neighbor entry already present for IP 10.255.0.2, mac 02:42:0a:ff:00:02"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.294984322Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:02"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.308336521Z" level=warning msg="Neighbor entry already present for IP 10.0.9.75, mac 02:42:0a:00:09:4b"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.308634950Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:4b"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.308850434Z" level=warning msg="Neighbor entry already present for IP 10.0.9.23, mac 02:42:0a:00:09:17"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.309034094Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:17"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.309228587Z" level=warning msg="Neighbor entry already present for IP 10.0.9.65, mac 02:42:0a:00:09:41"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.309435926Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:41"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.309651841Z" level=warning msg="Neighbor entry already present for IP 10.0.9.11, mac 02:42:0a:00:09:0b"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.309833449Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:0b"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.310063137Z" level=warning msg="Neighbor entry already present for IP 10.0.9.13, mac 02:42:0a:00:09:0d"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.310237092Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:0d"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.310419966Z" level=warning msg="Neighbor entry already present for IP 10.0.9.54, mac 02:42:0a:00:09:36"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.310589988Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:36"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.310782984Z" level=warning msg="Neighbor entry already present for IP 10.0.9.21, mac 02:42:0a:00:09:15"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.310960449Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:15"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.311151926Z" level=warning msg="Neighbor entry already present for IP 10.0.9.3, mac 02:42:0a:00:09:03"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.311325245Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:03"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.311602214Z" level=warning msg="Neighbor entry already present for IP 10.0.9.61, mac 02:42:0a:00:09:3d"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.311776341Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:3d"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.312011848Z" level=warning msg="Neighbor entry already present for IP 10.0.9.45, mac 02:42:0a:00:09:2d"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.312185377Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:2d"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.312381514Z" level=warning msg="Neighbor entry already present for IP 10.0.9.41, mac 02:42:0a:00:09:29"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.312562036Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:29"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.312841897Z" level=warning msg="Neighbor entry already present for IP 10.0.9.31, mac 02:42:0a:00:09:1f"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.313014885Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:1f"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.313333831Z" level=warning msg="Neighbor entry already present for IP 10.0.9.5, mac 02:42:0a:00:09:05"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.313510892Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:05"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.313759078Z" level=warning msg="Neighbor entry already present for IP 10.0.9.25, mac 02:42:0a:00:09:19"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.313935450Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:19"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.314123488Z" level=warning msg="Neighbor entry already present for IP 10.0.9.27, mac 02:42:0a:00:09:1b"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.314323007Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:00:09:1b"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.331360816Z" level=warning msg="Neighbor entry already present for IP 10.255.0.4, mac 02:42:0a:ff:00:04"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.331569737Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:04"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.331770960Z" level=warning msg="Neighbor entry already present for IP 10.255.0.9, mac 02:42:0a:ff:00:09"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.331950018Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:09"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.331770960Z" level=warning msg="Neighbor entry already present for IP 10.255.0.9, mac 02:42:0a:ff:00:09"
Jun 21 15:23:45 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:45.331950018Z" level=warning msg="Neighbor entry already present for IP 10.0.46.77, mac 02:42:0a:ff:00:09"
Jun 21 15:23:59 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:59.887137588Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153539940379039257764689584091832642" origError=<nil>
Jun 21 15:24:01 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:01.955016608Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=nke7voplt59qio9lvsyx948tp
Jun 21 15:24:12 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:12.907482444Z" level=error msg="Bulk sync to node ip-10-0-46-77-d41057dd06df timed out"
Jun 21 15:24:31 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:31.682132980Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:40154"
Jun 21 15:24:33 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:33.760149109Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:40160"
Jun 21 15:24:35 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:35.122557817Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:55370->10.0.46.77:7946: i/o timeout"
Jun 21 15:24:47 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:47.468712185Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:43580->10.0.101.134:7946: i/o timeout"
Jun 21 15:24:48 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:48.068181260Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:55404->10.0.46.77:7946: i/o timeout"
Jun 21 15:24:52 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:52.012558940Z" level=info msg="memberlist: Suspect ip-10-0-46-77-d41057dd06df has failed, no acks received"
Jun 21 15:24:57 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:24:57.290037150Z" level=warning msg="memberlist: Was able to reach ip-10-0-46-77-d41057dd06df via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP"
Jun 21 15:25:08 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:25:08.536331175Z" level=warning msg="memberlist: Refuting a suspect message (from: ip-10-0-101-134-756771be03b7)"
Jun 21 15:25:33 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:25:33.672998327Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 21 15:25:34 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:25:34.401032659Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 21 15:25:37 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:25:37.071571343Z" level=warning msg="memberlist: failed to receive: EOF from=10.0.46.77:40276"
Jun 21 15:25:50 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:25:50.924207577Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153540015060431764458414410708296002" origError=<nil>
Jun 21 15:25:53 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:25:53.441072387Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=r5gy21gonjkqcmhzuqxd8hj83
Jun 21 15:26:03 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:26:03.451888779Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 21 15:26:04 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:26:04.278766560Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 21 15:26:16 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:26:16.683060139Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153540034872308096302959318657933634" origError=<nil>
Jun 21 15:26:22 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:26:22.617421905Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=cpkjgj4ggk3jpwl5540zyep6g
Jun 21 15:26:30 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:26:30.397323299Z" level=warning msg="memberlist: Failed TCP fallback ping: write tcp 10.0.109.151:55472->10.0.46.77:7946: i/o timeout"
Jun 21 15:27:30 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:27:30.259837313Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 21 15:27:30 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:27:30.293792783Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 21 15:27:39 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:27:39.235800583Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153540093989989601277946706998075714" origError=<nil>
Jun 21 15:27:42 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:27:42.039514298Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=7wj0pq5um2rvkjg1jk58edq47
Jun 21 15:28:33 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:28:33.847638535Z" level=warning msg="memberlist: Failed TCP fallback ping: read tcp 10.0.109.151:55626->10.0.46.77:7946: i/o timeout"
Jun 21 15:28:42 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:28:42.544204934Z" level=info msg="Trying to get region from EC2 Metadata"
Jun 21 15:28:42 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:28:42.602892470Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName=dev-docker-swarm logStreamName=maxillo message="The specified log stream already exists" origError=<nil>
Jun 21 15:28:49 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:28:49.467118327Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=dev-docker-swarm logStreamName=maxillo message="The given sequenceToken is invalid. The next expected sequenceToken is: 49569207101139072783153540148764000636377570233668346178" origError=<nil>
Jun 21 15:28:50 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:28:50.635187693Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=qbio2455qa8aysx15p31aqdc8 service.id=rl90o2zc17b14r294p0a4dr7v task.id=5fixv9o7o9es78wotja765stm

EliRibble avatar Jun 21 '17 15:06 EliRibble

My guess at this point is that the issue has to do with the line:

Jun 21 15:23:42 ip-10-0-109-151 dockerd[18631]: time="2017-06-21T15:23:42.160320276Z" level=error msg="Bulk sync to node ip-10-0-101-134-756771be03b7 timed out"

as that seems to immediately precede the problem and isn't related to simple logging issues. I'm working on understanding better what that message means

EliRibble avatar Jun 21 '17 15:06 EliRibble

Alright, I started looking at the logs on the other nodes to see if there was anything obvious in them. This seems suspcious - this is from the system that runs nginx and needs the DNS entries for the other nodes to work to find the service it is reverse-proxying

Jun 21 15:23:08 ip-10-0-46-77 dockerd[974]: time="2017-06-21T15:23:08.965737168Z" level=error msg="fatal task error" error="task: non-zero exit (1)" module="node/agent/taskmanager" node.id=6mugebxyus7dgoip9i165mj64 service.id=rl90o2zc17b14r294p0a4dr7v task.id=ph3392o6cmjkta7hjlvtanh27
Jun 21 15:23:09 ip-10-0-46-77 dockerd[974]: time="2017-06-21T15:23:09.487271084Z" level=warning msg="underweighting node 6mugebxyus7dgoip9i165mj64 for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=6mugebxyus7dgoip9i165mj64
Jun 21 15:23:43 ip-10-0-46-77 dockerd[974]: time="2017-06-21T15:23:43.743729353Z" level=info msg="memberlist: Marking ip-10-0-109-151-4c957e3ec95e as failed, suspect timeout reached"

EliRibble avatar Jun 21 '17 16:06 EliRibble

Can you do a docker service ps rl90o2zc17b14r294p0a4dr7v --no-trunc ?

Or if that no longer exists, do it for whatever service is named during when this message appears again:

Jun 21 15:23:09 ip-10-0-46-77 dockerd[974]: time="2017-06-21T15:23:09.487271084Z" level=warning msg="underweighting node 6mugebxyus7dgoip9i165mj64 for service rl90o2zc17b14r294p0a4dr7v because it experienced 5 failures or rejections within 5m0s" module=node node.id=6mugebxyus7dgoip9i165mj64

dustin-decker avatar Jun 21 '17 16:06 dustin-decker

root@ip-10-0-46-77:/home/eliribble# docker service ps rl90o2zc17b14r294p0a4dr7v --no-trunc
ID                          NAME                IMAGE                                                                                               NODE                DESIRED STATE       CURRENT STATE               ERROR                       PORTS
1z516vsiax1j81gmr0vb3y6av   maxillo.1           authentise/maxillo:latest@sha256:a0b50d597dbc1a27b3c418b4066e9099ca4c22eb550c1985ad7999bed87db9ac   ip-10-0-101-134     Running             Starting 7 seconds ago
uvtomhf2kq43b8zwhn1j6p3jt    \_ maxillo.1       authentise/maxillo:latest@sha256:a0b50d597dbc1a27b3c418b4066e9099ca4c22eb550c1985ad7999bed87db9ac   ip-10-0-101-134     Shutdown            Failed 12 seconds ago       "task: non-zero exit (1)"
ucp0f67avjjgc5wnp0au4w959    \_ maxillo.1       authentise/maxillo:latest@sha256:a0b50d597dbc1a27b3c418b4066e9099ca4c22eb550c1985ad7999bed87db9ac   ip-10-0-109-151     Shutdown            Failed 32 seconds ago       "task: non-zero exit (1)"
d28gd05xnc3dhb6z8iqhmwkgz    \_ maxillo.1       authentise/maxillo:latest@sha256:a0b50d597dbc1a27b3c418b4066e9099ca4c22eb550c1985ad7999bed87db9ac   ip-10-0-101-134     Shutdown            Failed 42 seconds ago       "task: non-zero exit (1)"
0bbke9xkomhfwtoh2su8zy0n0    \_ maxillo.1       authentise/maxillo:latest@sha256:a0b50d597dbc1a27b3c418b4066e9099ca4c22eb550c1985ad7999bed87db9ac   ip-10-0-46-77       Shutdown            Failed about a minute ago   "task: non-zero exit (1)"

EliRibble avatar Jun 21 '17 16:06 EliRibble

I hadn't noticed before that maxillo was failing to start. That's been resolved now. I believe it was a red-herring, we've been having the problem for days and the maxillo failure should be only about 24 hours old

EliRibble avatar Jun 21 '17 16:06 EliRibble

@EliRibble, glad that helped, but I agree that there is something more wrong. I'm seeing some similar messages in one of my 17.05 clusters and I'm stuck as well. I'll be following the thread and hope it gets ironed out soon. Not sure if posting my logs will help, but I'll offer any information I can if it helps get to the bottom of the issue.

dustin-decker avatar Jun 21 '17 16:06 dustin-decker

Based on this discussion thread

https://groups.google.com/forum/#!topic/consul-tool/dQSHf2R93lI

I'm starting to wonder if the issue has to do with periodic UDP packet failure so that one node is failing to indicate it is still alive, being presumed dead and then coming back. I'm basing this off of the log line:

Suspect ip-10-0-109-151-2122b7c4e5ee has failed, no acks received"

and

Bulk sync to node ip-10-0-46-77-d41057dd06df timed out

EliRibble avatar Jun 21 '17 16:06 EliRibble

ping @sanimej

aaronlehmann avatar Jun 21 '17 16:06 aaronlehmann

I don't know if I'm having the same issue; but something very similar.

I went to go deploy a second environment, everything the exact same as my first, just in a different AWS region. (Everything is scripted - should be the exact same, minus IDs for newly created AWS resources)

I also have an NGINX reverse proxy, that was originally looking for the service name; however it wasn't able to resolve that, whether via NGINX or just an nslookup. However I was able to resolve it if I prefixed my stack name in front of the service name, such as: stack_servicename both when, and when the bug did not present itself.

thomasbiddle avatar Jul 14 '17 15:07 thomasbiddle

I think I'm running into this issue as well, with Docker for AWS 17.06.0-ce. In my case, the service in question is Prometheus, and about 3:45am local time this morning I started receiving messages like:

{"level":"warning","msg":"DNS resolution failed.","name":"tasks.prometheus-local","reason":"dial udp 127.0.0.11:53: i/o timeout","server":"127.0.0.11","source":"dns.go:190","time":"2017-07-17T07:43:47Z"}

(note: the service doing the logs is named prometheus-local, so it's failing to look its own DNS record up)

It was happening sporadically, and would recover within 2-3mins. Then immediately after the last outage the task died (with error non-zero exit (137)) and I haven't seen any DNS resolution issues with the task started up to replace it.

Both of these tasks were scheduled on the same host, and other tasks on that host seem not to be having DNS issues.

hairyhenderson avatar Jul 17 '17 14:07 hairyhenderson

We have the exact same problem excepts it happens every 24~48 hours.

This only started happening when we migrated from AWS to Azure.

docker version

<pre>Client: Docker Engine - Community
 Version:           20.10.6
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        370c289
 Built:             Fri Apr  9 22:46:01 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.6
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       8728dd2
  Built:            Fri Apr  9 22:44:13 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.4
  GitCommit:        05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc:
  Version:          1.0.0-rc93
  GitCommit:        12644e614e25b05da6fd08a38ffa0cfe1903fdec
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
</pre>

docker info

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
  scan: Docker Scan (Docker Inc., v0.7.0)

Server:
 Containers: 39
  Running: 26
  Paused: 0
  Stopped: 13
 Images: 157
 Server Version: 20.10.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: zcp2sjprerde3f71bwyrjxcmf
  Is Manager: true
  ClusterID: tw60ziploqgfjqr8dm7vzwizx
  Managers: 1
  Nodes: 3
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 20.188.56.247
  Manager Addresses:
   20.188.56.247:2377
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-1046-azure
 Operating System: Ubuntu 18.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 62.82GiB
 Name: swarm-master
 ID: HPSR:MEJD:CYQT:FFHZ:RMKF:NPLP:PL77:PZUY:OV6M:V3HB:PH6D:VSQG
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Is there a way to debug this behavior?

I thought it had something to do with Azure Vnets. I configured docker to use public IP address and the problem persists

abriosi avatar Apr 29 '21 20:04 abriosi