aws-cdk aws-step-function-tasks: Allow removing launchType from RunEcsEc2Task

trafficstars

capacity providers allow using ECS with dynamic capacity. https://aws.amazon.com/tw/about-aws/whats-new/2019/12/amazon-ecs-capacity-providers-now-available/ , yet CloudFormation and CDK do not yet support this feature https://github.com/aws/aws-cdk/issues/5471.

I propose as a stop gap solution to allow users to submit EC2 tasks without launchType: https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_RunTask.html#ECS-RunTask-request-launchType

if launchType is provided, there is no way to get the task to use the capacity providers that are created outside CDK.

Use Case

I want to create capacity providers using boto3 and use them when launching EC2 tasks using step functions.

Proposed Solution

Provider a way to make launchType optional here: https://github.com/aws/aws-cdk/blob/master/packages/%40aws-cdk/aws-stepfunctions-tasks/lib/ecs/run-ecs-ec2-task.ts#L62

Or to change it after construction.

Other

[x] :wave: I may be able to implement this feature request
[ ] :warning: This feature might incur a breaking change

This is a :rocket: Feature Request

May 13 '20 21:05 mb-dev

since RunEcsEc2Task and RunEcsFargateTask are going to be marked as @deprecated, this feature should be introduced in RunTask

Jun 30 '20 01:06 shivlaks

@mb-dev PR incoming once I merge in #8451 to make launchType optional. Step Functions does not support specifying capacityProviderStrategy as a parameter.

As per documentation you've referenced

If a capacityProviderStrategy is specified, the launchType parameter must be omitted. If no capacityProviderStrategy or launchType is specified, the defaultCapacityProviderStrategy for the cluster is used.

Although this unlocks some usage, it's still incomplete until users can supply a capacity provider strategy. Since we're limited I'm inclined to agree with your suggested stop-gap. It will still be compatible when capacityProviderStrategy can be supplied.

edit: added following question:

What happens if a cluster does not have a default capacity provider strategy? I believe that's optional too?

Jun 30 '20 06:06 shivlaks

per @mb-dev in comment

@shivlaks what alarmed me is that the PR makes the definition depend more on launch type than before. launchTarget sounds like launchType, and it's tied to either EC2/Fargate. I am not sure if it will be clear how to express a undefined launchType while still specifying parameters to be used at creation.

Currently no launchType might look like:

const runTask = new tasks.EcsRunTask(stack, 'Run', {
    integrationPattern: sfn.IntegrationPattern.RUN_JOB,
    cluster,
    taskDefinition,
    launchTarget: new tasks.EcsEc2LaunchTarget({
      launchType: undefined,
    }),
  });

If launchTarget would be named something like clusterConfiguration it might be easier to parse.

Jun 30 '20 07:06 shivlaks

@mb-dev launchType was never really exposed as a property (with the old classes or with the new implementation) The thinking was to associate launchTarget closely to launchType.

I was thinking of introducing something like IEcsCapacityProvider to add the capability to specify capacity providers. Users would specify either capacity providers or a launch target (which would become optional).

const runTask = new tasks.EcsRunTask(stack, 'Run', {
    integrationPattern: sfn.IntegrationPattern.RUN_JOB,
    cluster,
    taskDefinition,
    capacityProviders: [new tasks.EcsEc2CapacityProvider({
      ...
    })],
  });

and adding types for EcsFargateCapacityProvider and EcsEc2CapacityProvider classes to help populate that list of capacity providers.

what do you think?

Jun 30 '20 07:06 shivlaks

@shivlaks thanks for moving the discussion here. The capacity providers are a property of ECS cluster, while the task level property is choosing one of the providers specified in the cluster definition. Given that users might create ECS cluster or capacity providers outside of CDK, and that there's an option for default capacity provider (as this stop gap suggests)

there are actually 3 scenarios for using runTask in step functions:

users specifying launchType
users specifying capacityProvider
users specifying neither capacity provider or launchtype - in this case the default capacity provider is used. See: https://docs.aws.amazon.com/AmazonECS/latest/userguide/cluster-capacity-providers.html

Default capacity provider strategy
A default capacity provider strategy is associated with each Amazon ECS cluster. This determines the capacity provider strategy the cluster will use if no other capacity provider strategy or launch type is specified when running a task or creating a service.

One idea is to expose DEFAULT_CAPACITY_PROVIDER constant that can be assigned to launchType or launchTarget that will make it clear we want default capacity provider

Jun 30 '20 15:06 mb-dev

Capacity Providers were added to Cloudformation: https://github.com/aws/aws-cdk/commit/4ce27f4195c70bd9e365ec0e0df5c0ede863bc8a#diff-4a4e15a081904ee16b4d84e2f5cf5aee

So now it's even more important to be able to omit launchType.

Related: https://github.com/aws/aws-cdk/issues/5471

Jul 01 '20 21:07 mb-dev

@mb-dev

The recommended workaround for coverage that isn't quite supported in the aws-stepfunctions-tasks module is to supply the needed ASL as a custom state. You can create your task, and supply the .toStateJson() as input (you'd need to adjust for any parameters you want to add/remove/modify)

We're still discussing options, but are not convinced that the stop-gap needs to be introduced directly into the task or folded into the EcsEc2LaunchTarget / EcsFargateLaunchTarget classes yet.

launchtarget as a whole should likely be something we can omit. Can you help me better understand the EC2 / Fargate specific parameters that would need to be supplied (with the omission of launch target). I don't mind the constant idea but will want to better understand what the ASL looks like for these use cases.

Jul 02 '20 01:07 shivlaks

Ooo, i totally missed the escape hatch in the changelog. I will try that tomorrow and maybe that will be enough until full capacity providers support is added.

My pre-refactor EC2 integration is straightforward (in Python):

sft.RunEcsEc2Task(
            cluster=cluster,
            task_definition=task_definition,
            integration_pattern=sfn.ServiceIntegrationPattern.SYNC,
            container_overrides={...},
        )

So I personally won't need launchTarget, just not launchType.

Jul 02 '20 03:07 mb-dev

@mb-dev let me know if I can help with checking out the implementation of the custom state or if it would help for me to whip up an example!

Jul 07 '20 14:07 shivlaks

Hello! I have been using the CustomState to avoid this problem but now I am facing the problem of losing ResultPath during the "translation" (cf. this issue #8754).

Not really sure there is an easy way to avoid the problem (I can wrap my step in a parallel step) but annoying...

Feb 03 '21 23:02 PierreKiwi

is there any progress here? what does the roadmap look like? One year later, I still can't get the task to run on the correct capacity provider.

Jun 27 '22 12:06 trobert2

is there any progress here? what does the roadmap look like? One year later, I still can't get the task to run on the correct capacity provider.

You can use FARGATE_SPOT with Step Functions, as shown in my tests. You just need to omit the LaunchType field in your state machine definition.

Remove the field "LaunchType": "FARGATE" from the parameters.

{
  "Comment": "State machine integrated with ECS",
  "StartAt": "Initial",
  "States": {
    "Initial": {
      "Type": "Task",
      "Resource": "arn:aws:states:::ecs:runTask.sync",
      "Parameters": {
        "Cluster": "${ECS_CLUSTER_ARN}",
        "PlatformVersion": "LATEST",
        "TaskDefinition": "${ECS_TASK_DEFINITION_ARN}",
        "NetworkConfiguration": {
          "AwsvpcConfiguration": {
            "Subnets": ${SUBNETS},
            "AssignPublicIp": "ENABLED"
          }
        },
        "Overrides": {
          "ContainerOverrides": ${ECS_CONTAINER_OVERRIDES}
        }
      },
      "TimeoutSeconds": 3600,
      "End": true
    }
  }
}

May 29 '23 18:05 ferrarijefferson

Just bumped into this as well. My use case is spawning long-running tasks on (expensive) GPU instances that are scaled to zero most of the time. When the StepFunction workflow is at EcsRunTask, the expectation is that the cluster spins up the GPU instance to do the work. But that only works if the default capacity provider is used; currently, the execution always fails with

No Container Instances were found in your cluster. (Service: AmazonECS; Status Code: 400; Error Code: InvalidParameterException; Request ID: 5652eb66-58cf-4720-a4d5-40bedf7ddf22; Proxy: null)

Aug 18 '23 09:08 kevinbader

I created a class which implements IEcsLaunchTarget so this is a drop-in workaround:

export class EcsFargateSpotLaunchTarget implements sfnTasks.IEcsLaunchTarget {
    /**
     * Launch the ECS task using Fargate Spot.
     * 
     * This functionality is not built into CDK yet.
     */
    constructor(private readonly cluster: ecs.Cluster, private readonly options?: sfnTasks.EcsFargateLaunchTargetOptions) { }

    /**
     * Called when the Fargate launch type configured on RunTask
     */
    public bind(_task: sfnTasks.EcsRunTask, launchTargetOptions: sfnTasks.LaunchTargetBindOptions): sfnTasks.EcsLaunchTargetConfig {
        this.cluster.enableFargateCapacityProviders();

        if (!launchTargetOptions.taskDefinition.isFargateCompatible) {
            throw new Error('Supplied TaskDefinition is not compatible with Fargate');
        }

        return {
            parameters: {
                PlatformVersion: this.options?.platformVersion,
                "CapacityProviderStrategy": [
                    {
                        "CapacityProvider": "FARGATE_SPOT",
                        "Weight": 1
                    }
                ],
            },
        };
    }
}

const task = new sfnTasks.EcsRunTask(this, 'RunTask', {
    ...
    launchTarget: new EcsFargateSpotLaunchTarget(ecsCluster)
});

Nov 23 '23 22:11 ghost

I like @danw-mpl solution. I went for a slightly different approach and removed the LaunchType property entirely, allowing the ECS task to take the default capacity provider settings from the cluster.

C# example:

    public class EcsRunTaskWithoutLaunchType : EcsRunTask
    {
        public EcsRunTaskWithoutLaunchType(Construct scope, string id, IEcsRunTaskProps props) : base(scope, id, props)
        {
        }

        protected EcsRunTaskWithoutLaunchType(ByRefValue reference) : base(reference)
        {
        }

        protected EcsRunTaskWithoutLaunchType(DeputyProps props) : base(props)
        {
        }

        public override JObject ToStateJson()
        {
            var stateJson = base.ToStateJson();

            ((JObject)stateJson["Parameters"]!).Remove("LaunchType");

            return stateJson;
        }
    }

Nov 24 '23 08:11 Muppets

@Muppets I tried that initially and ran into an error from Step Functions - something like 'no container instances found in your cluster'.

I'd assume that's a problem with my cluster config though. Even though I had fargate capacity providers enabled.

Nov 24 '23 08:11 ghost

I've also run into this issue. Here is an implementation for Python that can be used, and is compatible with clusters that contain a default capacity provider and those that do not:

class EcsRunTakWithoutLaunchType(EcsRunTask):
    """Workaround for https://github.com/aws/aws-cdk/issues/7967"""

    def __init__(self, *args, **kwargs):
        self.cluster = kwargs.get('cluster')
        if not self.cluster:
            raise ValueError('cluster is required')
        super().__init__(*args, **kwargs)

    def to_state_json(self) -> typing.Mapping[typing.Any, typing.Any]:
        # if the cluster has a default capacity provider, we pop the LaunchType parameter
        # to use it by default (rather than attempting to use EC2 directly)
        state_json = super().to_state_json()

        if self.cluster.default_capacity_provider_strategy:
            state_json['Parameters'].pop('LaunchType')
            return state_json

        return state_json

The advantage here is that we can keep the EcsEc2LaunchTarget if the cluster has no default capacity provider defined.

Jun 27 '24 14:06 pwrmiller

aws-cdk aws-cdk copied to clipboard

aws-step-function-tasks: Allow removing launchType from RunEcsEc2Task

Use Case

Proposed Solution

Other

aws-cdk
aws-cdk copied to clipboard