Obtain Weave peers from members of the same ECS cluster
The new architecture we are designing uses ECS to hold a mix of EC2 instances (spot and reserved instances to optimize costs) which require two Auto Scaling Groups (ASGs) to work. However because weave is based on ASG, the Spot fleet does not join the Reserved fleet and the Weave network breaks down.
Is it possible to specify a parameter in a config file on boot to allow Weave to use the ECS Cluster instead of the ASG to group instances into a Weave network?
This one's tricky. In fact, we initially tried to infer the network peers from the ECS cluster members instead of the autoscaling group but there's a chicken and egg problem:
- A machine effectively joins a cluster after its ecs-agent starts and registers in the ECS infrastructure.
- Weave needs to start before the ecs-agent in order to ensure that the ecs-agent puts all the launched containers in the Weave network.
- Weave needs to know at least one peer at launch time.
Unfortunately (1) and (2) cause the cluster to be empty when weave is launched, making impossible to satisfy (3).
I looked into registering peers dynamically (i.e. after weave is launched), but at the time this was either not possible or it made impossible to reach initial IPAM consensus (I don't really remember).
@awh @bboreham Any ideas? @awh Maybe your current IPAM pre-consensus work could help with this?
@bryanvaz As an alternative, would it be good enough to allow peer-identification through tags? (see https://github.com/weaveworks/integrations/issues/1 ). You could tag your spot fleet with a specific tag. Plus it gives you finer control of your peers (you could join multiple autoscaling groups or only certain instances)
@2opremio Peer identification through tags would work. The main problem right now is actually ECS.
Since, ECS has no way to bind tasks of a service group to a particular set of instances (conditions either by type, ASG, LC, or spot/reserved). For example, we were experimenting with using a memcached cluster in ECS to back a group of web servers. However in our hybrid spot/reserved model, some of the web servers may be put in the spot fleet (to take advantage of spot savings). Unfortunately, spot servers can't use weave to talk to the memcached cluster sitting in the reserved fleet.
The obvious solution would be to register the memcached cluster with an ELB, and point the web servers to the ELB instead, but the use case for Weave is that it eliminates the need for each microservice to have an ELB, so probably not the best option.
The 2 other options we were considering were:
- Use the ECS cluster name defined in the ecs.config file (since this is mandatory for ECS to work) in conjunction with the AWS API check if there are preexisting instances in the ECS cluster, if so, restart weave, attaching it to one of the existing instances. If there are no weave clusters, it will just run as normal. (This of course causes a race condition when 2 or more instances are launched in the initial spinup. The race condition is eliminated if the reserved fleet is brought up first and/or only the spot fleet checks the cluster.)
- Use a AWS Lambda add orphan spot instances to the reserved fleet's Weave network (either as maintenance or as a post deploy event trigger).
What is the ordering? Can we get this:
weave launch ... container attach to weave ... ecs-agent start ... weave discovers its peers
?
Since, ECS has no way to bind tasks of a service group to a particular set of instances...
@bryanvaz yes, it is the case indeed and we had been discussing this with the folks at Amazon. From what I understand, it shouldn't be too hard to implement a custom schedule to achieve something like this. The custom scheduler abstraction is a rather basic one and one hand easy to use, on the other requires you to do all the work; in a nutshell you simply need to make the discussion and call StartTask, then ensure it's running, in case it crashed or anything.
[3. Weave needs to know at least one peer at launch time.
would be addressed by https://github.com/weaveworks/weave/issues/1721, so long as you can weave connect the peers before trying to allocate an IP address.
Since, ECS has no way to bind tasks of a service group to a particular set of instances (conditions either by type, ASG, LC, or spot/reserved).
Yep, like @errordeveloper already mentioned, we are also missing some sort scheduling rules/affinity . In particular we could really use something like k8s' DaemonSet.
Unfortunately, spot servers can't use weave to talk to the memcached cluster sitting in the reserved fleet.
@bryanvaz Any other causes for this apart from the ASG-based peer detection? In other words, would a tag-based solution (or ideally a ECS-cluster-based solution, if we solve the chicken-and-egg problem) be enough to get you forward?
What is the ordering? Can we get this:
weave launch ... container attach to weave ... ecs-agent start ... weave discovers its peers
@bboreham That is the current ordering, except for weave discovers its peers, which we do before weave launch. I gave up on discovering the peers after the launch for the reasons above.
- Weave needs to know at least one peer at launch time.
would be addressed by weaveworks/weave#1721, so long as you can weave connect the peers before trying to allocate an IP address.
I don't think we can control that. What would happen if an IP is allocated, by, for instance, running a container before the weave connects happen? In fact the weave connects may not happen at all (e.g. a 1-node cluster).
If you have a 1-node cluster then, effectively, the correct number of weave connects have happened.
But you might want a way to have IPAM defer any allocation until you know the number of peers.
Do note I'm trying to address the OP's point in a possible future, not trying to discuss the current implementation.
But you might want a way to have IPAM defer any allocation until you know the number of peers.
I guess that's needed for running a container before the weave connects happen, right? Otherwise it can result in an IPAM conflicts.
Yes. To give an example, suppose the order is (on host1):
1. weave launch
2. <some container starts and wants to attach to the weave network>
3. weave connect host2
4. weave connect host3
then we want to be able to say at the end of that sequence "weave done-with-connecting" which will let it go ahead and seek consensus.
The current implementation will force consensus at line 2, resulting in a clique.