pulumi-awsx icon indicating copy to clipboard operation
pulumi-awsx copied to clipboard

`awsx.ecs.FargateService` resources don't update in parallel, even on separate clusters

Open markalexander opened this issue 11 months ago • 3 comments

What happened?

I have a number of ECS clusters with Fargate services and tasks. I can't see any dependencies between them, but these never update in parallel, rather each waiting in turn for the last to update. This happens both for services on the same cluster and for services on different clusters.

Example

Here's a minimal example with:

  • Two clusters: Cluster A and Cluster B.
  • Cluster A has two services: Service 1 and Service 2.
  • Cluster B has one Service: Service 1.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
import * as awsx from "@pulumi/awsx";

const image = "docker.io/memcached:1.6.26"

const clusterA = new aws.ecs.Cluster("cluster-a", {});

new awsx.ecs.FargateService("cluster-a-service-1", {
  cluster: clusterA.arn,
  assignPublicIp: true,
  desiredCount: 1,
  taskDefinitionArgs: {
    container: {
      name: "cluster-a-service-1",
      image,
      cpu: 128,
      memory: 512,
      essential: true,
    },
  },
  forceNewDeployment: true,
});

new awsx.ecs.FargateService("cluster-a-service-2", {
  cluster: clusterA.arn,
  assignPublicIp: true,
  desiredCount: 1,
  taskDefinitionArgs: {
    container: {
      name: "cluster-a-service-2",
      image,
      cpu: 128,
      memory: 512,
      essential: true,
    },
  },
  forceNewDeployment: true,
});


const clusterB = new aws.ecs.Cluster("cluster-b", {});

new awsx.ecs.FargateService("cluster-b-service-1", {
  cluster: clusterB.arn,
  assignPublicIp: true,
  desiredCount: 1,
  taskDefinitionArgs: {
    container: {
      name: "cluster-b-service-1",
      image,
      cpu: 128,
      memory: 512,
      essential: true,
    },
  },
  forceNewDeployment: true,
});

Note that forceNewDeployment: true doesn't seem to work (at least for me per #1249), so you may have to e.g. change the tag in image to make it update the services on subsequent runs.

With this, run pulumi up. This results in each service being updated in series, e.g waiting here:


     Type                                  Name                 Status               Info
     pulumi:pulumi:Stack                   pulumi_ecs_test-dev  running..            
     └─ awsx:ecs:FargateService            cluster-a-service-1                       
        ├─ awsx:ecs:FargateTaskDefinition  cluster-a-service-1                       
 +-     │  └─ aws:ecs:TaskDefinition       cluster-a-service-1  replaced (0.00s)     [diff: ~containerDefinitions,family]
 ~      └─ aws:ecs:Service                 cluster-a-service-1  updating (42s)...    [diff: ~taskDefinition]

Note that neither the services within Cluster A run in parallel, nor the services in Cluster A vs Cluster B.

Output of pulumi about

docker  3.6.1
nodejs  unknown

Host     
OS       darwin
Version  14.1.2
Arch     arm64

This project is written in nodejs: executable='/Users/[...]/.asdf/shims/node' version='v16.15.1'

[...]

Found no pending operations associated with dev

Backend        
Name           pulumi.com
URL            https://app.pulumi.com/[...]
User           [...]
Organizations  [...]
Token type     personal

Dependencies:
NAME            VERSION
@pulumi/aws     6.28.1
@pulumi/awsx    2.6.0
@pulumi/pulumi  3.112.0
@types/node     18.19.26

Pulumi locates its logs in /var/folders/q3/mk2vg8h90lxdvblf8xs__w9c0000gq/T/ by default

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

markalexander avatar Mar 29 '24 17:03 markalexander

I found a few threads about this on the Pulumi Slack but couldn't find a good workaround or solution:

https://pulumi-community.slack.com/archives/CRH5ENVDX/p1695337458861349

https://pulumi-community.slack.com/archives/CRH5ENVDX/p1710520106520799

One suggestion was to use waitForSteadyState: false, which technically does cause them to run in parallel, just in a kind of fire-and-forget way since Pulumi is just starting the jobs and not checking them. You'd need to set up additional monitoring layers to check if the fired deploys were successful, maybe do rollbacks, etc. For our use-case this is unworkable, sadly.

markalexander avatar Mar 29 '24 17:03 markalexander

Thanks for reporting this @markalexander - I can see how having a lot of services and tasks would introduce unacceptable slowdown if all these operations are required to run sequentially. I'm not aware yet of what is causing this, we will have to take a close look. Marking with impact/performance.

t0yv0 avatar Apr 01 '24 14:04 t0yv0

Blocked here on a platform feature: https://github.com/pulumi/pulumi/issues/7629

t0yv0 avatar Apr 10 '24 15:04 t0yv0

Now that the platform feature https://github.com/pulumi/pulumi/issues/7629 is merged in I gave it a try and built a provider binary that includes that enhancement.

I tested with this example here https://github.com/pulumi/pulumi-awsx/pull/1334. After the change the updates to components happen in parallel, dramatically speeding up deployments that manage a lot of components.

Once the enhancement https://github.com/pulumi/pulumi/issues/7629 is released, we will upgrade pulumi-awsx

flostadler avatar Jun 27 '24 10:06 flostadler

glory to the maintainers, finally.

smsunarto avatar Jun 28 '24 20:06 smsunarto

The enhancement https://github.com/pulumi/pulumi/issues/7629 is released now. I updated pulumi-awsx to that version (https://github.com/pulumi/pulumi-awsx/pull/1338) and will kick off a release after the build succeeds

flostadler avatar Jul 02 '24 10:07 flostadler

This is now fixed in v2.13.0.

flostadler avatar Jul 02 '24 14:07 flostadler

This issue has been addressed in PR #1338 and shipped in release v2.13.0.

pulumi-bot avatar Jul 02 '24 15:07 pulumi-bot