community.aws icon indicating copy to clipboard operation
community.aws copied to clipboard

Add AWS ECS Capacity Provider Strategy Support

Open gregharvey opened this issue 4 years ago • 8 comments

Summary

Migrated from https://github.com/ansible/ansible/issues/67997 (it doesn't seem to have made it over automatically)

Add support for AWS ECS Cluster Capacity Provider Strategy configuration.

Additional note, I noticed this because I was creating a cluster to use with the GitLab CI Fargate driver and started getting The platform version must be null when specifying an EC2 launch type. when trying to launch a job. It worked with a manually created Cluster and Task Definition, so I looked closely and found the difference was the manually created cluster had two Capacity Providers and the Ansible created one had none, nor can you manually add them. It's clearly something the AWS UI takes care of, which you can do with the API (see additional info) but this module currently does not support. It means you can't really use it to set up a Fargate cluster at all.

Issue Type

Feature Idea

Component Name

ecs_cluster

Additional Information

Enable configuration of ECS cluster capacity providers and strategies thereof.

ecs_cluster:
  ...
  capacity_providers:
    - "FARGATE"
    - "FARGATE_SPOT"
  capacity_provider_strategy:
    - capacity_provider: "FARGATE"
      base: 1
      weight: 1
    - capacity_provider: "FARGATE_SPOT"
      weight: 100 

hashicorp/terraform-provider-aws#11150 https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cluster-capacity-providers.html https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.put_cluster_capacity_providers

Code of Conduct

  • [X] I agree to follow the Ansible Code of Conduct

gregharvey avatar Oct 19 '21 16:10 gregharvey

Files identified in the description:

  • [plugins/modules/ecs_cluster.py](https://github.com/['ansible-collections/amazon.aws', 'ansible-collections/community.aws', 'ansible-collections/community.vmware']/blob/main/plugins/modules/ecs_cluster.py)

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot avatar Oct 19 '21 16:10 ansibullbot

cc @Java1Guy @jillr @markuman @s-hertel @tremble @wimnat click here for bot help

ansibullbot avatar Oct 19 '21 16:10 ansibullbot

It means you can't really use it to set up a Fargate cluster at all.

hm I'm not sure about it.
At work we create some fargate ecs clusters just with

- name: create ecs cluster
  ecs_cluster:
    name: serverless-housekeeping
    state: present

and we can run ecs taskdefinitions with launch_type: FARGATE without any problems in that cluster.

    - name: letsencrypt taskdefinition
      ecs_taskdefinition:
        family: letsencrypt
        cpu: "256"
        memory: "512"
        state: present
        network_mode: awsvpc
        launch_type: FARGATE
        execution_role_arn: "arn:aws:iam::{{ caller_facts.account }}:role/ecsTaskExecutionRole"
        task_role_arn: "arn:aws:iam::{{ caller_facts.account }}:role/letsencryptECSTask"
        region: eu-central-1
        containers:
          - name: letsencrypt
            environment:
              - name: KMS
                value: "{{ kms.ssm }}"
            essential: true
            image: "{{ caller_facts.account }}.dkr.ecr.eu-central-1.amazonaws.com/letsencrypt:latest"
            logConfiguration:
              logDriver: awslogs
              options:
                awslogs-group: /ecs/letsencrypt
                awslogs-region: eu-central-1
                awslogs-stream-prefix: ecs
      register: letsTD

markuman avatar Oct 20 '21 05:10 markuman

It means you can't really use it to set up a Fargate cluster at all.

hm I'm not sure about it.

...which does not mean that the parameters should not be supported by community.aws.ecs_cluster
I think its not that hard to implement.

markuman avatar Oct 20 '21 06:10 markuman

Hi @markuman, thanks for the reply. That's interesting, I wonder what I'm doing wrong then? It's OT for the issue but for some reason my Ansible-created cluster can't launch Fargate task definitions but my manually created one can ... and I can't see any other difference. I'll keep digging though, if it works for you then at least I know it's possible! :-)

gregharvey avatar Oct 20 '21 07:10 gregharvey

@gregharvey

---
- hosts: localhost
  connection: local
  tags:
    - example

  vars:
    region: eu-central-1
    subnet: subnet-d8309db2
    security_group: sg-f32f0196
    ecs_trusted_relationship: |
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
              "Service": "ecs-tasks.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
          }
        ]
      }


  tasks:
    - name: Get the current caller identity facts
      aws_caller_info:
      register: caller_facts

    - name: create ecsTaskExecution role
      iam_role:
        name: ecsTaskExecutionRole
        description: ecsTaskExecutionRole with to many permissions
        state: present
        purge_policies: yes
        managed_policy:
          - arn:aws:iam::aws:policy/CloudWatchLogsFullAccess
          - arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
          - arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceRole
          - arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceEventsRole
          - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
        assume_role_policy_document: "{{ ecs_trusted_relationship }}"

    - name: create ecs cluster
      ecs_cluster:
        name: sometcluster
        state: present
        region: "{{ region }}"

    - name: create cloudwatch log group
      cloudwatchlogs_log_group:
        log_group_name: /ecs/fargate-test
        retention: 1
        region: "{{ region }}"

    - name: some fargate task definition
      ecs_taskdefinition:
        family: something
        cpu: "256"
        memory: "512"
        state: present
        network_mode: awsvpc
        launch_type: FARGATE
        execution_role_arn: ecsTaskExecutionRole
        task_role_arn: ecsTaskExecutionRole
        region: "{{ region }}"
        containers:
          - name: something
            command:
              - uptime
            essential: true
            image: "alpine:latest"
            logConfiguration:
              logDriver: awslogs
              options:
                awslogs-group: /ecs/fargate-test
                awslogs-region: "{{ region }}"
                awslogs-stream-prefix: ecs
      register: td_output

    - name: Run task
      community.aws.ecs_task:
        operation: run
        cluster: sometcluster
        task_definition: something
        count: 1
        started_by: ansible_user
        launch_type: FARGATE
        network_configuration:
          subnets:
            - "{{ subnet }}"
          security_groups:
            - "{{ security_group }}"
      register: task_output

    - debug:
        var: task_output

- hosts: localhost
  connection: local
  tags:
    - cleanup

  vars:
    region: eu-central-1
    subnet: subnet-d8309db2
    security_group: sg-f32f0196
    ecs_trusted_relationship: |
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
              "Service": "ecs-tasks.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
          }
        ]
      }

  tasks:
    - name: remove iam role
      iam_role:
        name: ecsTaskExecutionRole
        description: ecsTaskExecutionRole with to many permissions
        state: absent
        purge_policies: yes
        managed_policy:
          - arn:aws:iam::aws:policy/CloudWatchLogsFullAccess
          - arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
          - arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceRole
          - arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceEventsRole
          - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
        assume_role_policy_document: "{{ ecs_trusted_relationship }}"

    - name: remove ecs cluster
      ecs_cluster:
        name: sometcluster
        state: absent
        region: "{{ region }}"

    - name: remove cloudwatch log group
      cloudwatchlogs_log_group:
        log_group_name: /ecs/fargate-test
        retention: 1
        region: "{{ region }}"
        state: absent

adjust just the vars

AWS_PROFILE=yourprofile ansible-playbook 770.yml --tags example
and AWS_PROFILE=yourprofile ansible-playbook 770.yml --tags cleanup to remove the resources.

just the image is failing to pull (no idea atm)

 Stopped reason
CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref "docker.io/library/alpine:latest": failed to do request: Head https://registry-1.docker.io/v2/library/alpine/manifests/latest: dial tcp 52.204.76.244:443: i/o tim...

but at least, it works without any issue to run fargate container in a cluster made by ecs_cluster module.

markuman avatar Oct 20 '21 08:10 markuman

Thank you so much, I'll give it a go! :+1:

gregharvey avatar Oct 21 '21 06:10 gregharvey

Just to follow up here, in case someone has a similar problem. The code above works perfectly, so indeed you clearly can create a cluster and run a task. However, my GitLab fargate custom executor still wasn't working. I reviewed the docs to try and understand what's different, and for reasons I don't know there's steps 7 and 8 here to add a default capacity provider strategy:

  • https://docs.gitlab.com/runner/configuration/runner_autoscale_aws_fargate/#step-5-create-an-ecs-fargate-cluster

Without that step it doesn't work. I presume the driver does not set the launch_type when it runs a task, and that it defaults to EC2 if you don't either specify FARGATE when you launch the task or tell your cluster to favour FARGATE. This is really a bug in the fargate driver for GitLab Runner, in fairness, but I could work around it if Ansible let me set that default capacity provider strategy. So it would be handy. :-)

gregharvey avatar Oct 21 '21 13:10 gregharvey