containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[ECS]: Support "A" Records with ECS Service Discovery when `host` networking is used

Open talawahtech opened this issue 5 years ago • 15 comments

Tell us about your request Support "A" Records with ECS Service Discovery when host networking is used.

Which service(s) is this request for? ECS/Cloud Map

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? I would like to be able to easily connect to my services from a cli http client or a browser using the domain name provided by by service discovery. SRV records are not supported by most off the shelf tools so I am requesting support for A records.

Are you currently working around this issue? I am registering the services at the host level using user data scripts and de-registering them using instance termination hooks

Additional context I understand that A records at the host level limit the usefulness of service discovery to a single service per instance, but I am fine with that.

talawahtech avatar May 07 '19 20:05 talawahtech

Very much agreed here. Conceptually service discovery and how it integrates with ECS services is great. The fact that SRV records often need to be handled using custom tooling is not great.

A records should be supported for both host and bridge modes. There is really should be no reason why this cannot be done. In either of those modes, the private IPV4 address for the EC2 instance can be used. There are lots of use cases where we run a containerized service for internal use, and one instance is suitable. Or I have placement policies on services specifying no more than one task instance per host.

mrapczynski avatar May 29 '19 19:05 mrapczynski

My use case is running a few instances of a proxy server on a fixed port. I do not need a large number of instances of this proxy, so I am not concerned about being able to have multiple per instance. Having A records would simplify things greatly.

shidel-dev avatar Jun 20 '19 16:06 shidel-dev

Question: now that ENI trunking has launched, you can have many more tasks per instance when using awsvpc mode for many instance types. If you use awsvpc mode for your service, then you can use Cloud Map service discovery. What is the blocker to using awsvpc mode? We are evaluating supporting A records with 'host' mode, but we also want to know what is stopping customers from using awsvpc networking with ECS. Feedback appreciated on both topics! Thanks.

coultn avatar Jun 20 '19 22:06 coultn

@coultn One example would be if I wanted re-use the same Service Discovery DNS name to connect to other daemons running on the instance(s) e.g. SSH. With awsvpc it would be a different IP address. Also I remember running into an issue when using a Network Loadbalancer with awsvpc networking. The client IP was that of the load balancer rather than the actual client.

talawahtech avatar Jun 21 '19 01:06 talawahtech

@coultn Totally valid point. Previously using awsvpc mode made no sense for us because of the ENI limits. We run m5.2xlarge in production, and with trunking, now using ENIs is viable. We have some internal work to do since we build a custom AMI, and would also need to re-configure our test environment which currently uses only T3 spot instances. I think in the next month or two, we would be willing to start using awsvpc mode with sufficient testing.

That all said, and please excuse the snark, but A records should be supported with host or bridge modes because it's easy. In a semi-related experience, I have a Lambda that reacts to CloudWatch events from autoscaling groups to make Route53 updates on a private zone. Only took a couple hours in total. For ECS, similar functionality in connection with service discovery would be so useful to everyone that it should be a standard feature, and not be a bespoke add-on which an organization needs to write themselves with Lambda.

mrapczynski avatar Jun 25 '19 14:06 mrapczynski

@mrapczynski True.

I don't understand why I cant use A records when dealing with bridge mode...

julienMichaud avatar Jul 22 '19 08:07 julienMichaud

eni trunking is not support on t3 instances that we are using a lot. I can only have 1 awsvpc task per t3.small instance. we would be happy to use awsvpc if the limit is at least 20 tasks / t3 instance

also eni trunking is not supported in non-ecs optimized image

sandangel avatar Aug 28 '19 23:08 sandangel

@coultn

what is stopping customers from using awsvpc networking with ECS

Some of my containers need access to the internet. awsvpc requires a nat gateway to access the internet. Using 'host' mode avoids the additional cost of a nat gateway. The services (container) that access the internet are single instances so there would be no conflicting service name and IP address. . would only map to a single EC2 instance. I hope you support A record type for 'host' mode soon.

chapinmark avatar Dec 09 '19 05:12 chapinmark

@coultn

Question: what is stopping customers from using awsvpc networking with ECS

In my case, I am using an internal NLB to work around the issue, but do not have enough load to justify the use of any instances that support ENI trunking. I would like to use M5.small (or even nano), but this doesn't exist, so it is cheaper to use T3.small with an extra NLB, but this is also wasteful as DNS service discovery would meet my needs.

Is it likely that A records will be supported? or smaller instances will support ENI trunking? Thanks

day1118 avatar Jan 31 '20 06:01 day1118

I would also like this feature. We reverted from using awsvpc networking mode back to host mode after production outages caused by ENI trunking issues.

Now we cannot use service discovery registration from ECS as only SRV records may be created by containers using host networking. The application I am working with uses DNS for peer discovery and cannot resolve SRV records.

It seems strange that the developers of Service Discovery would put such an arbitrary restriction on their service, especially considering that most applications do not support SRV records. This means I will need to create a service lambda or something to do the job of Service Discovery without the arbitrary restriction.

lukeplausin avatar May 10 '20 23:05 lukeplausin

It would be good to know whether support for A records in host and bridge mode is even considered. My company is currently looking for a way to get off weave DNS and this seems like the simplest solution.

kirotnes avatar Jun 07 '20 18:06 kirotnes

I'm also interested in this. I'm deploying a service with huge ports range on ECS, with "host" networking.

Since A records are not supported, I have to roll my own service discovery...

Antonito avatar Sep 07 '20 12:09 Antonito

Any updates on this? Even with App Mesh, you still need to use awsvpc networking mode; and as mentioned above, it carries the cost of having to use NAT gateway, which in some cases for HA purposes is 1 per public AZ subnet.

@Antonito - What did you end up using for service discovery? I'm torn between 2 options. A lambda triggered on ECS task state changes; or, some sort of sidecar with haproxy, which supports SRV record resolution.

leonfs avatar Mar 23 '21 23:03 leonfs

Bump?

Our application needs to listen on a large port range which requires host mode networking (awsvpc port mapping is too limited). This also precludes using an elastic load balancer in front of the ECS/EC2 service, which again supports only a limited number of listeners.

Cloud Map seems like the obvious tool to resolve the hosts directly, but ECS/EC2 only provides support for SRV records :exploding_head:

Any other way around this?

Seems kinda nuts that ECS/EC2 specifically excludes A records given that the vast majority of client software does not support SRV records.

abonstu avatar Mar 23 '22 03:03 abonstu

I tried using service discovery tonight with an ECS service having networkmode=bridge and had the troubles described by others in this issue. I had to fall back on what I've used in the past. The suggestion below only works with single-container services, because there is only one Route53 A record which updates when the task starts.

Given: 2+ EC2 instances in an auto-scaling group, an ECS service that could have a task on any instance, a requirement to not use networkmode=awsvpc, and a desire to know via DNS where the service is (or was most recently) running.

  1. create a private Route53 hosted zone (e.g. me.local)
  2. upsert a record (e.g. myservice.me.local) in that hosted zone when a task starts, pointing the record to the host's private IP address, using the script below, either in the Dockerfile or elsewhere at task startup
export PRIVATE_IP=$(curl http://169.254.169.254/latest/meta-data/local-ipv4)
aws route53 change-resource-record-sets --hosted-zone-id ABCDEFG --change-batch '{"Changes": [{"Action": "UPSERT","ResourceRecordSet": {"Name": "myservice.me.local","Type": "A","TTL": 30,"ResourceRecords": [{"Value": "$PRIVATE_IP"}]}}]}'

I use an EC2 Instance Profile with the policy below, so that the task can update the Route53 record.

{
  "Version": "2012-10-17",
  "Statement": [ 
    {
      "Action": [
        "route53:ChangeResourceRecordSets"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:route53:::hostedzone/ABCDEFG"
      ]
    }  
  ]
}

johnstanfield avatar Aug 24 '22 05:08 johnstanfield

We released ECS Service Connect which specifically addresses this problem. In Service Connect you provide a discoverable name for each of the containers / ports that you want to address.

herrhound avatar Mar 09 '23 21:03 herrhound

Service connect requires an addition "256 CPU units and at least 64 MiB of memory" per proxy sidecar container which is a ridiculous amount of overhead to add just to get A records provisioned for a container.

For my use-case (many small containers) that overhead is terminal and means that Service Connect is not a solution to this problem at all.

Allowing configuration of an A record in addition to a SRV record in host/bridge modes seems like a trivial option to enable - what is the resistance or rational for not addressing this?

mattbnz avatar Nov 22 '23 23:11 mattbnz