terraform-aws-ecs icon indicating copy to clipboard operation
terraform-aws-ecs copied to clipboard

Maximum two tasks are running on one instance

Open KrystianJanas opened this issue 1 year ago β€’ 3 comments

Description

Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration (see the examples/* directory for references that you can copy+paste and tailor to match your configs if you are unable to copy your exact configuration). The reproduction MUST be executable by running terraform init && terraform apply without any further changes.

If your request is for a new feature, please use the Feature request template.

  • [x] βœ‹ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Before you submit an issue, please perform the following first:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists

Versions

  • Module version [Required]: 5.11.2

  • Terraform version: ~> 1.6.3

  • Provider version(s): hashicorp/aws: ~> 5.31

Reproduction Code [Required]

Steps to reproduce the behavior:

I'm not using terraform workspaces. I cleared local cache.

Expected behavior

Run more than two tasks on one instance (type: t3a.medium but I tried also run them on example m6a.large and the same issue)

Actual behavior

I running example 4 services in ECS. Every of them has dedicated 512CPU and 512 MEM. Instance type t3a.medium has 2048 CPU and 3883 memory. I tried also modify these services to 256CPU and 512MEM, but it is still not working as expected. ECS service automatically connect two of tasks to one instance and no more - I don't know why.

Terminal Output Screenshot(s)

Additional context

ecs.tf:

module "ecs" {
  count = var.tags.Environment == "prod" ? 1 : 0

  source = "terraform-aws-modules/ecs/aws"

  cluster_name = local.ECS_CLUSTER_NAME
  tags         = local.tags

  cluster_configuration = {
    execute_command_configuration = {
      logging = "OVERRIDE"
      log_configuration = {
        cloud_watch_log_group_name = "aws/ecs/aws-ec2/COMPANY_NAME-${local.project_name}"
      }
    }
  }

  default_capacity_provider_use_fargate = false

  task_exec_secret_arns = [
      .......... protected ..............
  ]

  autoscaling_capacity_providers = {
    rit-1-app = {
      auto_scaling_group_arn         = module.autoscaling-apps[0].autoscaling_group_arn
      managed_termination_protection = "DISABLED"

      managed_scaling = {
        maximum_scaling_step_size = 2
        minimum_scaling_step_size = 1
        status                    = "ENABLED"
        target_capacity           = 70
      }
    }
  }

    (local.apps.pdf-printer-prod.name) = {
      subnet_ids = data.terraform_remote_state.vpc.outputs.vpc-config.private_subnets

      requires_compatibilities = ["EC2"]
      cpu                      = 512
      memory                   = 512

      create_security_group = true
      security_group_rules = {
        alb_ingress = {
          type                     = "ingress"
          from_port                = local.apps.pdf-printer-prod.container_port
          to_port                  = local.apps.pdf-printer-prod.container_port
          protocol                 = "tcp"
          description              = "Service port"
          source_security_group_id = aws_security_group.alb_sg[0].id
        }
        egress_all = {
          type        = "egress"
          from_port   = 0
          to_port     = 0
          protocol    = "-1"
          cidr_blocks = ["0.0.0.0/0"]
        }
      }

      capacity_provider_strategy = {
        rit-1-app = {
          capacity_provider = module.ecs[0].autoscaling_capacity_providers["rit-1-app"].name
          base              = 1
          weight            = 1
        }
      }

      load_balancer = {
        service = {
          target_group_arn = aws_lb_target_group.alb_target_group[local.apps.pdf-printer-prod.name].arn
          container_name   = local.apps.pdf-printer-prod.name
          container_port   = local.apps.pdf-printer-prod.container_port
        }
      }

      task_exec_iam_statements = [
        {
          actions   = ["logs:CreateLogGroup"]
          effect    = "Allow"
          resources = ["*"]
          sid       = "CreateLogGroup"
        },
      ]

      container_definitions = {
        (local.apps.pdf-printer-prod.name) = {

          cpu                = 512
          memory             = 512
          memory_reservation = 100

          essential = true
          image     = local.apps.pdf-printer-prod.image
          port_mappings = [
            {
              name          = local.apps.pdf-printer-prod.name
              containerPort = local.apps.pdf-printer-prod.container_port
              protocol      = "tcp"
            }
          ]

          readonly_root_filesystem = false

          enable_cloudwatch_logging = true
          log_configuration = {
            logDriver = "awslogs"
            options = {
              awslogs-create-group  = "true"
              awslogs-group         = "/aws/ecs/${local.apps.pdf-printer-prod.name}/logs"
              awslogs-region        = local.DEFAULT_AWS_REGION
              awslogs-stream-prefix = "api"
            }
          }

        }
      }
    }
}

One more code - autoscaling.tf:

module "autoscaling-apps" {
  count = var.tags.Environment == "prod" ? 1 : 0

  source  = "terraform-aws-modules/autoscaling/aws"
  version = "7.3.1"

  name = "${local.project_name}-autoscaling-apps-instances"

  image_id      = jsondecode(data.aws_ssm_parameter.ecs_optimized_ami.value)["image_id"]
  instance_type = local.apps_instance_type

  user_data = base64encode(
    <<-EOT
        #!/bin/bash
        cat <<'EOF' >> /etc/ecs/ecs.config
        ECS_CLUSTER=${local.ECS_CLUSTER_NAME}
        ECS_LOGLEVEL=debug
        ECS_CONTAINER_INSTANCE_TAGS=${jsonencode(local.tags)}
        ECS_ENABLE_TASK_IAM_ROLE=true
        EOF
      EOT
  )

  security_groups = [module.autoscaling_sg[0].security_group_id]

  create_iam_instance_profile = true
  iam_role_name               = local.project_name
  iam_role_description        = "IAM role for ${local.project_name} - autoscaling"
  iam_role_policies = {
    AmazonEC2ContainerServiceforEC2Role = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
    AmazonSSMManagedInstanceCore        = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  }

  metadata_options = {
    http_endpoint               = "enabled"
    http_tokens                 = "required"
    http_put_response_hop_limit = 1
  }

  vpc_zone_identifier = data.terraform_remote_state.vpc.outputs.vpc-config.private_subnets
  health_check_type   = "EC2"
  min_size            = 3
  max_size            = 8
  desired_capacity    = 8

  protect_from_scale_in = false

  autoscaling_group_tags = {
    AmazonECSManaged = true
  }

  use_mixed_instances_policy = false

  enabled_metrics = [
    "GroupAndWarmPoolDesiredCapacity",
    "GroupAndWarmPoolTotalCapacity",
    "GroupDesiredCapacity",
    "GroupInServiceCapacity",
    "GroupInServiceInstances",
    "GroupMaxSize",
    "GroupMinSize",
    "GroupPendingCapacity",
    "GroupPendingInstances",
    "GroupStandbyCapacity",
    "GroupStandbyInstances",
    "GroupTerminatingCapacity",
    "GroupTerminatingInstances",
    "GroupTotalCapacity",
    "GroupTotalInstances",
    "WarmPoolDesiredCapacity",
    "WarmPoolMinSize",
    "WarmPoolPendingCapacity",
    "WarmPoolTerminatingCapacity",
    "WarmPoolTotalCapacity",
    "WarmPoolWarmedCapacity",
  ]

  tags = local.tags
}

KrystianJanas avatar Jun 11 '24 08:06 KrystianJanas

Example: image image

And global view of infrastructure: image

As you can see, there is a lot of unused resources which can be allocated in some of others instances, but there are limit to 2 running tasks on instances.

KrystianJanas avatar Jun 11 '24 08:06 KrystianJanas

Is your task using awsvpc as the network mode? If so, it will be creating an elastic network interface (ENI) per task, and there's a limit per instance.

Thumbscrew avatar Jun 19 '24 15:06 Thumbscrew

There are 3 options that I know of:

  1. network mode: awsvpc - max is two tasks for a 2 ENI instance like a c7i.large
  2. network mode awsvps with trunking allow ECS ENI trunking - an instance types that takes two tasks could not take 10 tasks. If differs from instance type to the other based on its networking capabilities. use: aws ecs put-account-setting --name awsvpcTrunking --value enabled --principal-arn arn:aws:iam::999999999:role/ecsInstanceRole --region us-west-1
  3. The default for ECS is bridge mode which takes a large number of tasks using dynamic port mapping. You will need to open up the necessary port ranges in the security group. (load-balancer health check failure in this setup is usually due to not opening up the ports).

use bridge mode. When you get strict fine tuned security requirements that you cannot meet with bridge mode, you can then reconfigure to use trunking.

ziadrida avatar Jun 21 '24 07:06 ziadrida

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar Jul 22 '24 00:07 github-actions[bot]

This issue was automatically closed because of stale in 10 days

github-actions[bot] avatar Aug 01 '24 00:08 github-actions[bot]

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Sep 03 '24 02:09 github-actions[bot]