compose-cli icon indicating copy to clipboard operation
compose-cli copied to clipboard

Introduce support for sidecars in ECS

Open CarmenAPuccio opened this issue 3 years ago • 31 comments

Description ECS and Fargate support the concept of a sidecar for things like reverse proxies, logging, telemetry, etc... ECS also has a concept of container dependency in the task definition in order to control the startup for example as outlined here. While I realize this support does not exist today in Compose as outlined in compose-spec/compose-spec#65 I can easily imagine customers running on ECS would want this functionality.

Additional environment details (AWS ECS, Azure ACI, local, etc.): AWS ECS

CarmenAPuccio avatar Apr 16 '21 17:04 CarmenAPuccio

Compose-spec introduced support for depends_on: { condition: service_completed } which is an initial step toward init containers support, but the actual way to get them expressed within a compose application model is still to be discussed.

About user request for sidecars, my guess is that they would better adopt a declarative approach that let them opt-in for "service with telemetry" or "expose with reverse proxy", similar to istio on Kubernetes, vs having to explicitly configure a sidecar container, copy/pasting some complex container definition that they will then have to maintain.

About ECS support for dependencies and startup order, please note we are blocked by https://github.com/awslabs/goformation/issues/61 to let user flag a service as Essential: false. Our own internal init container are using a terrible hack awaiting a better option.

ndeloof avatar Apr 19 '21 07:04 ndeloof

i need this in my life.

dalegaspi avatar Apr 20 '21 21:04 dalegaspi

@dalegaspi could you please describe your use case, so we better understand how this can fit into the compose model, and how the compose file format could evolve to embrace this scenario?

ndeloof avatar Apr 21 '21 05:04 ndeloof

@ndeloof we have a service in ECS today deployed using docker-compose and it's basically a Spring Cloud Gateway working in conjunction with Envoy: the Envoy fronts the SCG and for all intents and purposes, they work as one and the Envoy works as a sidecar. Today, the docker-compose creates 2 task definitions with 2 services operating independently and the refer to each other "locally" through CloudMap. don't get me wrong, it works but there's unnecessary network hops...and the lack of explicit dependency of one service to the other is not much of a big deal to me. but as @CarmenAPuccio noted above...there are a couple of things that would make this better:

  1. have some mechanism to indicate that one service is dependent on another (again, not a big deal for my use case but it's a nice to add feature)
  2. (this in my mind is more important) the task definitions can have more than 1 containers (so essentially we are defining just 1 service)...the SCG and envoy containers will just launch in one service which makes it more efficient as there are less network hops, and the Envoy container acts like a true sidecar.

i have no strong opinion on how the above features should be implemented, but the more we adhere with the docker-compose way of doing things (as opposed to ECS extensions) would probably be the preferable option...probably.

dalegaspi avatar Apr 21 '21 13:04 dalegaspi

thanks for the details. so, basically, you're using a sidecar to add servicemesh feature to your deployment, which maybe could be supported using AWS AppMesh for better integration with the platform. For sure this could be expressed using explicit sidecar container definition, but you'll have to copy/paste this same container sidecar task definition for each service, and maintain it, while your actual intent is about higher-level routing/monitoring/resilience requirements that shoud be express as such, not by platform implementation details.

have some mechanism to indicate that one service is dependent you can just use depends_on in your compose file

ndeloof avatar Apr 21 '21 13:04 ndeloof

yes there are service mesh features in use but we prefer portability and not be dependent on AWS-specific feature.

...while your actual intent is about higher-level routing/monitoring/resilience requirements that should be express as such

point taken...but i didn't really give details as to how the Envoy and SCG work together for us, but the "actual intent" a bit more than just routing/monitoring/resilience 😊...hence the decision to go with sidecar approach.

i'm confused i thought the depends_on doesn't work yet for docker-compose ECS? but good call, nonetheless, i will start using it.

dalegaspi avatar Apr 21 '21 13:04 dalegaspi

A little more information here @ndeloof to think about. Dexter's use case is a common one and the way you implement AppMesh with an Envoy proxy is defined here. You are right that the actual intent in this case is about higher-level routing but unfortunately, the implementation details are key in this scenario as there is no mutating webhook admission controller concept in ECS/Fargate like there is in k8s so the user has to define it in their task definition for each and every service.

There are obviously many other examples where we need to inject a sidecar like in the case of logging with something like FluentBit. In this case, ECS/Fargate does make the implementation a little easier but the user is still left to define the "sidecar" in their task definition as seen here.

CarmenAPuccio avatar Apr 21 '21 13:04 CarmenAPuccio

I totally agree there's technical challenges offering portability, but - typically - Compose model does define a logging section where I'd like I can opt-in for FluentBit or anything comparable, and not have to read AWS docs on the expected sidecar setup I need to apply for this to work. Think about port mapping in plain docker: user says "container port 8080 must be exposed as 80", we don't as him to pass an iptable rule. That's what I mean by "we need to capture the user intent in a higher-level construct within compose-spec".

ndeloof avatar Apr 21 '21 13:04 ndeloof

Yep fair point... The idea is to make it easier so they don't have to go read the AWS docs I agree. With that said (and what got me going down this route honestly) is there are common scenarios where a user needs a sidecar (even without AWS specific details). For example, I was building a demo .NET app and trying to front the Kestrel web server with an NGINX reverse proxy as seen here.

CarmenAPuccio avatar Apr 21 '21 13:04 CarmenAPuccio

sure, I'm not saying there's no use-case to require sidecars, just it seems to me most use-cases are actually implementation details for many modern infrastructure services (routing, security, monitoring, retry, short-circuit, logging, ...) which, from a user point of view, would better be expressed with higher-level constructs.

To use the "routing" use-case for illustration, istio or Traefik both allow use of annotations to define the routing. If we could specify such things in compose-spec, using X or Y implementation under the hood would become a platform detail. most users.

@CarmenAPuccio about your .Net app sample: I agree a reverse proxy "does it better with HTTP" and most deployment will actually have such infrastructure components. This doesn't mean this should be explicitly modeled in your compose file. Typically, when deploying on AWS I'd probably opt for a platform provided reverse proxy service, the same way deploying to Kubernetes I'm setting ingress but never deploy an explicit nginx frontend container.

ndeloof avatar Apr 21 '21 13:04 ndeloof

More requests on the line of this: https://devops.stackexchange.com/questions/13802/docker-compose-to-ecs-two-services-in-one-task

mreferre avatar Apr 30 '21 11:04 mreferre

@mreferre here again, I'd tend to ask "why do you need nginx as a (sidecar) frontend to your django service? Doesn't AWS offer something comparable directly implemented by the infrastructure (typically, using ALB rules to manage http->https redirect, 404, etc) Or, to be more generic, "as a developer, why would I have to hard-code the HTTP infrastructure optimizations in my compose file". This should - worst scenario - be defined by an additional "infrastructure" file, but not tied to my service declaration.

ndeloof avatar Apr 30 '21 11:04 ndeloof

the reason i prefer to have this sidecar pattern option is so that i can view the two/more services as just one. less resources, less network hops. that should be a good enough reason, no?

dalegaspi avatar Apr 30 '21 13:04 dalegaspi

@ndeloof I think there are multiple reasons for that. @dalegaspi has touched on a few (maximize Fargate task resource usage particularly if my containers are tiny, less network traffic, less private VPC addresses being consumed, shared storage among containers that belong to the same task, possibly others). There are also situations where customers may need to run nginx for other reasons even if similar ALB features are available (for example to be able to stand up the same stack on-prem / on their laptop for high fidelity because there is no ALB outside of AWS). Nginx is just an example of a sidecar and I don't think it's the most important.

mreferre avatar Apr 30 '21 13:04 mreferre

@ndeloof I made the thread on devops exchange. The main reason I need to use Nginx in my stack is because the most used WSGI application for Django (gunicorn) does not support serving of static files. (Maybe this could be achieved with ALB, but it does not sound like an easy task).

I think it would offer a lot of flexibility to be able to run multiple containers in one task! So I +1 this request as I would have a lot of use for it.

Edit:

To clarify abit why this would be useful for me. I have two services, nginx and web. Nginx depends on web. Like I said above I'm using nginx as a reverse proxy, but that's not the main purpose. It's to serve static files from my django project. So the project won't work without it.

So I launched my project with compose-cli. Everything worked great. But I realized that it did not make alot of sense for me to run a whole separate instance of nginx per web instance. Nginx does not require a lot of resources it think, but would be allocated alot of unused resources.

My first idea was to use nginx as a load balancer so I would only have one of that instance. But that would make my ALB excessive and I prefer to use ALB since they are very simple.

So my second idea was to just put the two containers in the same instance so it would be less resource intensive. So without this feature currently I'm forced to use an very edited version of the cloud-formation template that docker compose created with the aws-cli directly.

Ps. I've also had the issue that when I update my docker images and use docker compose up to ECS. Then ECS will first open a new instance of task web then close the existing one. Then a new nginx task is opened, and then the old one is closed. This will cause some downtime of my website where the old nginx task is tied to the old web instance. (This is might be user error from my side though)

feelixe avatar Apr 30 '21 19:04 feelixe

Just a quick note that, from a technical standpoint, Compose does support sidecars in the sense you can define namespaces to be shared (network_mode: "service:foo"). Obviously, as this is mixed with service definitions, which own cluster/scalability-related attributes, this could hardly be used for this purpose without an opinionated interpretation of user intents by the Compose implementation, but still seems feasible to me without much impacts on the spec.

I'd welcome concrete proposals on the compose-spec to find the right balance between introducing a complex nested sidecars container definition structure within services and (ab)using the existing shared-namespace capability of the compose syntax.

ndeloof avatar May 03 '21 14:05 ndeloof

My use case:

Our app uses just two containers: nginx and php-fpm. After months of digging through documentation, I finally got our ECS cluster stable. The last remaining piece is getting Datadog integrated so we can monitor our app performance, logs, container resources, etc. Datadog provides a docker image that is capable of discovering containers and grabbing these metrics. However, this Datadog container must be added to each task definition. I added the Datadog container to my docker-compose.yml file, but when this gets deployed to ECS, it becomes its own, isolated task definition. I cant figure out how to get it to be included as part of my nginx and php-fpm task definitions, rather than being its own. Is there a way to use overlays in docker-compose.yml to accomplish this (had to use this to get the nginx 443 port properly configured since there was no equivalent in the compose spec)? Any workaround? Pls halppp lol

jullian-chavez avatar Aug 18 '21 16:08 jullian-chavez

I would love to see a solution to this for sidecars for logging and metrics

fluffy avatar Sep 10 '21 02:09 fluffy

Just a quick note that, from a technical standpoint, Compose does support sidecars in the sense you can define namespaces to be shared (network_mode: "service:foo").

This makes sense to me. My primary need for a Sidecar is to put a container running Shibboleth SP and ModProxy in front of each instance of my Rails application container. My main concern is that they use the same ENI when deployed on ECS so that they can talk with low latency and without need for SSL between them.

Obviously, as this is mixed with service definitions, which own cluster/scalability-related attributes, this could hardly be used for this purpose without an opinionated interpretation of user intents by the Compose implementation, but still seems feasible to me without much impacts on the spec.

I agree. It seems acceptable to me that if I deliberately specify that two services share a network interface that I shouldn't expect to be able to scale them separately.

ritchiey avatar Nov 14 '21 22:11 ritchiey

@ndeloof I made the thread on devops exchange. The main reason I need to use Nginx in my stack is because the most used WSGI application for Django (gunicorn) does not support serving of static files. (Maybe this could be achieved with ALB, but it does not sound like an easy task).

I think it would offer a lot of flexibility to be able to run multiple containers in one task! So I +1 this request as I would have a lot of use for it.

Edit:

To clarify abit why this would be useful for me. I have two services, nginx and web. Nginx depends on web. Like I said above I'm using nginx as a reverse proxy, but that's not the main purpose. It's to serve static files from my django project. So the project won't work without it.

So I launched my project with compose-cli. Everything worked great. But I realized that it did not make alot of sense for me to run a whole separate instance of nginx per web instance. Nginx does not require a lot of resources it think, but would be allocated alot of unused resources.

My first idea was to use nginx as a load balancer so I would only have one of that instance. But that would make my ALB excessive and I prefer to use ALB since they are very simple.

So my second idea was to just put the two containers in the same instance so it would be less resource intensive. So without this feature currently I'm forced to use an very edited version of the cloud-formation template that docker compose created with the aws-cli directly.

Ps. I've also had the issue that when I update my docker images and use docker compose up to ECS. Then ECS will first open a new instance of task web then close the existing one. Then a new nginx task is opened, and then the old one is closed. This will cause some downtime of my website where the old nginx task is tied to the old web instance. (This is might be user error from my side though)

This is very similar to what I'm dealing with right now and have been trying to figure out for 3 days straight.

The reason the nginx-task is restarting is that when the old web-task goes down, the nginx task keeps forwarding requests to it, when it can't find it anymore it starts throwing 502 errors, then 499 errors before it exits. When it restarts it discovers the new web-task and things work again.

I would love some way of attaching one nginx-instance per web-task, right now it's not really viable for me to have the web service go down for 2-3 minutes every time I redeploy some source code.

I have been thinking about ditching nginx altogether and just use the ALB (you could use whitenoise for staticfiles this way), but this comes with some drawbacks, mainly how gunicorn processes requests.

andrekrosby92 avatar Dec 31 '21 15:12 andrekrosby92

I'd love to see some level of support for this. My usecase is even more mundane - i would love the ability to run some administrative tasks before starting a service, for example ensuring database migrations are up-to-date:

services:
  web-run-migrations:
    command: /run-migrations-script.sh
  web:
    command: /start-server.sh
    depends_on:
      web-run-migrations:
        condition: service_completed_successfully

This setup would ensure the migrations are run only once, and the scaling the web service as needed.

janrito avatar Mar 09 '22 12:03 janrito

I too would greatly appreciate sidecar support. My use case is getting secrets from an external provider (Hashicorp Vault in this case) and passing them to my application. Outside of ECS I do this with a sidecar Vault container which simply gets the secrets, sticks them in a shared volume, and then dies. Inside of ECS with docker-compose it seems like this just isn't possible.

FraserThompson avatar May 30 '22 02:05 FraserThompson

You can do all of the above mentioned using ECS Compose-X which uses labels to group services in your docker-compose file under the same logical task definition / task family and then service. Have been doing that for XRay/AppMesh/CloudWatch (for EMF + Prometheus exporter)

From seeing another issue in this repo about firelens and needing it myself for my company, I am adding a FireLens configuration wrapper as well to make it easy to use and support firelens-config-type: s3 on Fargate (which is not possible today).

In fact we use sidecars already for very various applications:

  • DB upgrade scripts that execute prior to running the service and require for it to succeed before starting "the real service"
  • Monitoring (exporters)
  • (coming soon GA) FireLens auto-configured with FluentBit (FluentD might come in afterwards)
  • Configuration bootstrap: using Files Composer which has an extended syntax of original CFN ConfigSet for files, ensures that configuration files are all set, configured, etc, prior to the main service starting (which allows to fetch files from S3, SSM, SecretsManager, plain text etc.).

Hit me up if you had questions. Hope this helps

JohnPreston avatar May 30 '22 07:05 JohnPreston

I too would greatly appreciate sidecar support. My use case is getting secrets from an external provider (Hashicorp Vault in this case) and passing them to my application. Outside of ECS I do this with a sidecar Vault container which simply gets the secrets, sticks them in a shared volume, and then dies. Inside of ECS with docker-compose it seems like this just isn't possible.

It's possible with the x-aws-cloudformation overlay (https://docs.docker.com/cloud/ecs-integration/#tuning-the-cloudformation-template), however, once you start going down this path, you probably would be better off just writing regular CloudFormation templates.

andrekrosby92 avatar May 30 '22 13:05 andrekrosby92

@andrekrosby92 I tried this but concluded it wasn't really practical because you can't use an overlay to append to the ContainerDefinitions array of an existing Task, so you'd have to manually implement all of the ContainerDefinitions spat out by docker compose instead. That's what this issue is about https://github.com/docker/compose-cli/issues/2160

FraserThompson avatar May 30 '22 21:05 FraserThompson

+1 for sidecar support in compose-cli, we would greatly benefit from it too. As some other use cases above, ours involves using Nginx as a reverse proxy (mainly) whilst being able to roll updates without the downtime that comes with the current capability and without having to interfere too much in the generated template by docker compose.

D-Simona-G avatar Feb 02 '23 09:02 D-Simona-G

Using docker compose w/ ecs apparently the only way to get rolling updates to work with a nginx reverse proxy is creating your own sidecar. From a practical standpoint, im not aware of many apps that are not fronted with a reverse proxy, this puts in a position where we lose the simplicity of docker compose and start maintaining cloudformation.

Here is a very messy workaround: https://stackoverflow.com/questions/70532751/nginx-container-on-ecs-not-following-rolling-update

artband avatar Mar 03 '23 13:03 artband

+1 on this -- still trying to find a workaround, but my current situation:

I have a Java gRPC service, and a mobile application can only access the gRPC service through an envoy proxy. I would like to run my envoy as a sidecar, so it can forward traffic using 127.0.01:XXXX. This can only be done if the envoy proxy is running as a sidecar within the same task.

Deeg-Kim avatar Mar 06 '23 06:03 Deeg-Kim

Would love to see built-in support for this as well.

As a workaround for https://github.com/docker/compose-cli/issues/2160, I'm using yq as part of my deploy script to generate a compose yaml config with sidecar container definitions injected.

# deploy.sh
yq ea 'explode(.) as $item ireduce ({}; . *+ $item) | {"x-aws-cloudformation": .}' \
  <(docker -c aws compose -f docker-compose.yml convert) \
  <(cat cfn-sidecars.yml) \
  > docker-compose.deploy.yml

docker -c aws compose -f docker-compose.yml -f docker-compose.deploy.yml up

In my case, I wanted to add a Datadog sidecar container to some of my services:

# cfn-sidecars.yml
Resources:
  WebTaskDefinition: &aws-task-def
    Properties:
      ContainerDefinitions:
        - Name: datadog-agent
          Image: 'public.ecr.aws/datadog/agent:latest'
          Environment:
            ...
          PortMappings:
            ...

  WorkerTaskDefinition:
    <<: *aws-task-def

I'm still just starting out using the compose ECS integration, so I'm not sure if there are any edge cases to worry about when merging the yaml like this.

mumumumu avatar Mar 08 '23 00:03 mumumumu

Is this whole project even maintained any more? I have the slight feeling that this docker compose integration approach is, while being super attractive at first, just another dead end :( Go use CDK (or terraform or pulumi or whatever) instead.

mfittko avatar Mar 15 '23 10:03 mfittko