pulumi-awsx
pulumi-awsx copied to clipboard
`awsx.ecs.FargateService` resources don't update in parallel, even on separate clusters
What happened?
I have a number of ECS clusters with Fargate services and tasks. I can't see any dependencies between them, but these never update in parallel, rather each waiting in turn for the last to update. This happens both for services on the same cluster and for services on different clusters.
Example
Here's a minimal example with:
- Two clusters: Cluster A and Cluster B.
- Cluster A has two services: Service 1 and Service 2.
- Cluster B has one Service: Service 1.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
import * as awsx from "@pulumi/awsx";
const image = "docker.io/memcached:1.6.26"
const clusterA = new aws.ecs.Cluster("cluster-a", {});
new awsx.ecs.FargateService("cluster-a-service-1", {
cluster: clusterA.arn,
assignPublicIp: true,
desiredCount: 1,
taskDefinitionArgs: {
container: {
name: "cluster-a-service-1",
image,
cpu: 128,
memory: 512,
essential: true,
},
},
forceNewDeployment: true,
});
new awsx.ecs.FargateService("cluster-a-service-2", {
cluster: clusterA.arn,
assignPublicIp: true,
desiredCount: 1,
taskDefinitionArgs: {
container: {
name: "cluster-a-service-2",
image,
cpu: 128,
memory: 512,
essential: true,
},
},
forceNewDeployment: true,
});
const clusterB = new aws.ecs.Cluster("cluster-b", {});
new awsx.ecs.FargateService("cluster-b-service-1", {
cluster: clusterB.arn,
assignPublicIp: true,
desiredCount: 1,
taskDefinitionArgs: {
container: {
name: "cluster-b-service-1",
image,
cpu: 128,
memory: 512,
essential: true,
},
},
forceNewDeployment: true,
});
Note that forceNewDeployment: true
doesn't seem to work (at least for me per #1249), so you may have to e.g. change the tag in image
to make it update the services on subsequent runs.
With this, run pulumi up
. This results in each service being updated in series, e.g waiting here:
Type Name Status Info
pulumi:pulumi:Stack pulumi_ecs_test-dev running..
└─ awsx:ecs:FargateService cluster-a-service-1
├─ awsx:ecs:FargateTaskDefinition cluster-a-service-1
+- │ └─ aws:ecs:TaskDefinition cluster-a-service-1 replaced (0.00s) [diff: ~containerDefinitions,family]
~ └─ aws:ecs:Service cluster-a-service-1 updating (42s)... [diff: ~taskDefinition]
Note that neither the services within Cluster A run in parallel, nor the services in Cluster A vs Cluster B.
Output of pulumi about
docker 3.6.1
nodejs unknown
Host
OS darwin
Version 14.1.2
Arch arm64
This project is written in nodejs: executable='/Users/[...]/.asdf/shims/node' version='v16.15.1'
[...]
Found no pending operations associated with dev
Backend
Name pulumi.com
URL https://app.pulumi.com/[...]
User [...]
Organizations [...]
Token type personal
Dependencies:
NAME VERSION
@pulumi/aws 6.28.1
@pulumi/awsx 2.6.0
@pulumi/pulumi 3.112.0
@types/node 18.19.26
Pulumi locates its logs in /var/folders/q3/mk2vg8h90lxdvblf8xs__w9c0000gq/T/ by default
Additional context
No response
Contributing
Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
I found a few threads about this on the Pulumi Slack but couldn't find a good workaround or solution:
https://pulumi-community.slack.com/archives/CRH5ENVDX/p1695337458861349
https://pulumi-community.slack.com/archives/CRH5ENVDX/p1710520106520799
One suggestion was to use waitForSteadyState: false
, which technically does cause them to run in parallel, just in a kind of fire-and-forget way since Pulumi is just starting the jobs and not checking them. You'd need to set up additional monitoring layers to check if the fired deploys were successful, maybe do rollbacks, etc. For our use-case this is unworkable, sadly.
Thanks for reporting this @markalexander - I can see how having a lot of services and tasks would introduce unacceptable slowdown if all these operations are required to run sequentially. I'm not aware yet of what is causing this, we will have to take a close look. Marking with impact/performance.
Blocked here on a platform feature: https://github.com/pulumi/pulumi/issues/7629
Now that the platform feature https://github.com/pulumi/pulumi/issues/7629 is merged in I gave it a try and built a provider binary that includes that enhancement.
I tested with this example here https://github.com/pulumi/pulumi-awsx/pull/1334. After the change the updates to components happen in parallel, dramatically speeding up deployments that manage a lot of components.
Once the enhancement https://github.com/pulumi/pulumi/issues/7629 is released, we will upgrade pulumi-awsx
glory to the maintainers, finally.
The enhancement https://github.com/pulumi/pulumi/issues/7629 is released now. I updated pulumi-awsx to that version (https://github.com/pulumi/pulumi-awsx/pull/1338) and will kick off a release after the build succeeds
This is now fixed in v2.13.0.
This issue has been addressed in PR #1338 and shipped in release v2.13.0.