integrations icon indicating copy to clipboard operation
integrations copied to clipboard

weave is not initialized correctly while ec2 autoscaling is happening

Open mmoharam opened this issue 8 years ago • 1 comments

Hi,

I'm new to weave net and I'm trying to run it in one of our ECS systems. I'm using the latest weave ecs ami (ECS-AMI-2017-09-07)

I noticed that sometimes while ec2 instances autoscaling is taking place, the newly running docker containers stop reporting their logs to the associated logging groups and stop replying to the ALB health checks too.

ssh'd to one of the ec2 instances, noticed that ifconfig shows the weave with no associated inet ip as usual:


> weave     Link encap:Ethernet  HWaddr 0E:A3:BF:82:89:28
>           inet6 addr: fe80::ca3:bfff:fe82:8928/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1376  Metric:1
>           RX packets:14 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:920 (920.0 b)  TX bytes:828 (828.0 b)

I logged into one of the docker containers, executed ifconfig and the weave network driver (ethwe) was not there.

The weave docker logs shows the following:


> [ec2-user@ip-10-56-12-221 ~]$ docker logs weave
> INFO: 2017/10/17 17:28:26.616607 Command line options: map[H:[unix:///var/run/weave/weave.sock] host
> v.conf status-addr:127.0.0.1:6782 dns-effective-listen-address:172.17.0.1 ipalloc-range:10.32.0.0/12
> -10-56-12-221 plugin:false proxy:true datapath:datapath]
> INFO: 2017/10/17 17:28:26.617172 weave  2.0.4
> INFO: 2017/10/17 17:28:26.621381 Docker API on unix:///var/run/docker.sock: &[Version=17.03.2-ce Git
> on=1.12 Os=linux Arch=amd64 BuildTime=2017-08-09T22:45:09.101301574+00:00]
> INFO: 2017/10/17 17:28:26.622869 Using docker bridge IP for DNS: 172.17.0.1
> INFO: 2017/10/17 17:28:26.629054 proxy listening on unix:///var/run/weave/weave.sock
> INFO: 2017/10/17 17:28:26.828936 Bridge type is bridged_fastdp
> INFO: 2017/10/17 17:28:26.831397 Communication between peers is unencrypted.
> INFO: 2017/10/17 17:28:26.843710 Our name is 0e:a3:bf:82:89:28(ip-10-56-12-221)
> INFO: 2017/10/17 17:28:26.843729 Launch detected - using supplied peer list: []
> INFO: 2017/10/17 17:28:26.844784 Docker API on unix:///var/run/docker.sock: &[Version=17.03.2-ce GoV
> piVersion=1.27 MinAPIVersion=1.12 GitCommit=7392c3b/17.03.2-ce Arch=amd64]
> INFO: 2017/10/17 17:28:26.845328 Checking for pre-existing addresses on weave bridge
> INFO: 2017/10/17 17:28:26.890640 [allocator 0e:a3:bf:82:89:28] No valid persisted data
> INFO: 2017/10/17 17:28:26.896593 [allocator 0e:a3:bf:82:89:28] Initialising via deferred consensus
> INFO: 2017/10/17 17:28:26.900710 Listening for DNS queries on 172.17.0.1
> INFO: 2017/10/17 17:28:26.919414 Sniffing traffic on datapath (via ODP)
> INFO: 2017/10/17 17:28:26.921751 Listening for HTTP control messages on 127.0.0.1:6784
> INFO: 2017/10/17 17:28:26.922089 Listening for metrics requests on 127.0.0.1:6782
> INFO: 2017/10/17 17:28:27.406478 Discovered local MAC 1a:61:76:80:89:fd
> INFO: 2017/10/17 17:28:27.406601 Discovered local MAC e6:4c:f6:98:0f:26
> INFO: 2017/10/17 17:28:27.758436 Discovered local MAC 0e:a3:bf:82:89:28
> INFO: 2017/10/17 17:29:00.989089 Assuming quorum size of 1
> INFO: 2017/10/17 17:29:01.023697 [nameserver 0e:a3:bf:82:89:28] adding entry for e55a80d3da381218ca3
> INFO: 2017/10/17 17:29:01.024015 [nameserver 0e:a3:bf:82:89:28] adding entry for e55a80d3da381218ca3
> INFO: 2017/10/17 17:29:11.048722 [nameserver 0e:a3:bf:82:89:28] adding entry for e55a80d3da381218ca3
> INFO: 2017/10/17 17:29:11.049071 [nameserver 0e:a3:bf:82:89:28] adding entry for e55a80d3da381218ca3
> INFO: 2017/10/17 17:29:21.069819 [nameserver 0e:a3:bf:82:89:28] adding entry for e55a80d3da381218ca3
> INFO: 2017/10/17 17:29:21.070157 [nameserver 0e:a3:bf:82:89:28] adding entry for e55a80d3da381218ca3
> INFO: 2017/10/17 17:29:31.091208 [nameserver 0e:a3:bf:82:89:28] adding entry for e55a80d3da381218ca3
> INFO: 2017/10/17 17:29:31.091541 [nameserver 0e:a3:bf:82:89:28] adding entry for e55a80d3da381218ca3

I didn't understand what is going on. so I did a "weave reset" and "docker restart ecs-agent". After that everything started to work normally.

The ifconfig now shows the associated inet addr normally:

> weave     Link encap:Ethernet  HWaddr 0E:A3:BF:82:89:28
>           inet addr:10.32.0.1  Bcast:0.0.0.0  Mask:255.240.0.0
>           inet6 addr: fe80::ca3:bfff:fe82:8928/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1376  Metric:1
>           RX packets:154 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:30840 (30.1 KiB)  TX bytes:1290 (1.2 KiB)

and the docker containers has the ethwe driver.

Can you please help me understanding why this is happening and how to avoid it?

thanks.

mmoharam avatar Oct 17 '17 18:10 mmoharam

Hi @mmoharam and sorry for the lengthy delay in response.

This line shows a problem:

INFO: 2017/10/17 17:28:26.843729 Launch detected - using supplied peer list: []

the list (inside square brackets) should contain all the other hosts in the group, so Weave Net will connect up and form a cluster. Since the list is empty it forms a cluster of size 1, and this will never connect up to the other peers.

The code that creates the list is peers.sh - I do not know why it would return blank.

bboreham avatar Nov 01 '17 11:11 bboreham