kit
kit copied to clipboard
SD: EC2 service discovery
We could add an EC2 service discovery package. EC2 allows for searching instances via tags. We could again use aws-sdk-go and manage fetching instances via ec2.DescribeInstances.
Hey @cam-stitt and @peterbourgon! I'd like to get involved. Would you guys mind if I take a crack at this? 🙏
Please do! Before starting in with the code properly, I'd suggest doing a design phase, either just as a narrative in this issue or with a bit of pseudocode.
Guys apologies this took a while. Actually thinking about it a bit more, in this specific use-case go-kit is performing the role of an ELB (or an ALB). So to get the ball rolling...
The EC2 implementation would generally follow the style of the implementations in package sd.
Implementation Details (assume the new package is called ec2)
-
There would be an internal method in
ec2for fetching instancehost:port(EC2 instance IPs in this case, usingaws-sdk-go) which updatessd.Cache. Like what @cam-stitt mentioned, that would mean AWS API calls. -
There will be no concrete implementation of the
sd.Registrarinterface (similar todnssrv). It is assumed that the EC2 instance tagging (for newly provisioned instances) will be handled by a mechanism such as CloudFormation (which auto-tags instances if they're in an autoscaling group) or by some other means. We merely query the set of instances using a tag. -
EC2 instance tag
name:valuepair would be a constructor parameter.
Questions
-
Getting
portwill be a bit tricky because there's no way to get which service is mapped to a given port (in the context of AWS that is). It can be inferred from the list of allowed ingress ports (via the security groups) but that could be error-prone. We could just pass that in as constructor parameter to this new SD. 🤔 WDYT? -
Do we need to accomodate for the case where different EC2 instances are tagged the same
name:valuepair are part of different CloudFormation stacks? -
(a follow up from question
#1) ifportwas passed as a constructor parameter, would we need to check if that port is reachable for all instances (i.e. the instance is healthy and there's no security group rule that's blocking access) before adding it to the list of instances. -
💭 In case we get rate limited by the AWS API endpoint, would the best thing to do in this case is return stale data from
sd.Cache?
Your thoughts @peterbourgon, @cam-stitt?
- I'd expect that users would know the port a priori when specifying a set of EC2 instance tags. Is that reasonable? If you deploy an EC2 AMI, your service is probably listening on a fixed port, right?
- I don't know the details here. Can you say more? I would expect users to specify instance tags that would authoritatively map to instances running specific services.
- Your call at the implementation level. Most other SD packages don't bother doing health checking, so by default I wouldn't expect it here, either.
- Yes, but we should do whatever possible to respect rate limits.
My two cents:
- I'm with @peterbourgon, the port will be known by the user and does not have to be part of this. Putting the port on the constructor of the object may make the most sense.
- Yup, users should be placing tags (like
service: addsvc) on their instances. The CFN stacks should not concern us at all. - We could limit the EC2 discovery to utilise ELB's, or provide that as another style of SD. That way you could just search all instances within an ELB that are healthy.
- Yes. There would have to be a polling process for this as AWS api does not push updates.
Great feedback @peterbourgon, @cam-stitt!
OK, so taking into account your suggestions, here's a rough plan:
portwould be a constructor argument.- Users would be expected to tag their instances with
service: addsvc(customizable; could be a constructor argument as well i.e.tagName). - I like the idea of using a common ELB and determining instance health from that. That makes health checking the stack user/maintainer's responsibility. The discovery mechanism would not affected by heath check logic changes. I'll take this route then.
- So we'll poll AWS to get the latest instance information and apply the usual retry/backoff/jitter in case we get rate limited.
With that I'll get cracking with the implementation. PR coming 🔜 !
Just playing devils-advocate during your design phrase, given the functionality AWS has been delivering at the ALB layer (ssl certs, waf, dynamic port mapping of ecs services etc), under what sorts of scenarios would you expect to want to do service discovery at the ec2-instance level, over the alb-fronting-a-service level?
Hey thanks for the feedback @wyaeld. I've also mentioned this before:
Actually thinking about it a bit more, in this specific use-case go-kit is performing the role of an ELB (or an ALB).
I do agree that the current feature set of an ALB is enough to cover the majority of use cases. I'm happy to drop this task unless @peterbourgon and/or @cam-stitt can think of other scenarios where such a discovery functionality would be useful.
One issue with ALB is that it cannot be used in conjunction with GRPC (from all my tests and research). If you want to use GRPC, you have to still use ELB. Due to the popularity of GRPC in microservices, it seems like this still has a very valid use case.