terraform-aws-emr icon indicating copy to clipboard operation
terraform-aws-emr copied to clipboard

EMR Cluster Service Role not able to assume the EMR Cluster Autoscaling Role

Open Andrea-Gallicchio opened this issue 3 months ago β€’ 2 comments

Description

I've tried to deploy an EMR cluster using a custom autoscaling policy and it turned out that the cluster gets successfully created but the custom automatic scaling policy fails.

To debug this, I've started to look into the EMR events, these two were the most meaningful: EMR-Events

Then, I've looked into the Cloudtrail Logs, and I found out that the EMR Cluster Service Role was not able to assume the EMR Cluster Autoscaling Role. The error message was like that: Unable to assume IAM role: arn:aws:iam::aws-account-id:role/Spark-ETL-autoscaling

After that, I checked the trust relationship of the Autoscaling Role, which looked like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "EMRAssumeRole",
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "elasticmapreduce.amazonaws.com",
                    "application-autoscaling.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "123456"
                },
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:elasticmapreduce:eu-central-1:123456:*"
                }
            }
        }
    ]
}

And I've also verified the AWS doc here, regarding the trust relationship that the autoscaling role for EMR must have:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "application-autoscaling.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "<account-id>"
                },
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:application-autoscaling:<region>:<account-id>:scalable-target/*"
                }
            }
        }
    ]
}

It's pretty straightforward to note that the condition with "aws:SourceArn": "arn:aws:application-autoscaling:<region>:<account-id>:scalable-target/*" is missing in the module here.

To solve the issue, I had to implement the trust relationship of the autoscaling role for EMR as following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "EMRAssumeRole",
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "elasticmapreduce.amazonaws.com",
                    "application-autoscaling.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "123456"
                },
                "ArnLike": {
                    "aws:SourceArn": [
                        "arn:aws:elasticmapreduce:eu-central-1:123456:*",
                        "arn:aws:application-autoscaling:eu-central-1:123456:scalable-target/*"
                    ]
                }
            }
        }
    ]
}
  • [X] βœ‹ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]: v2.0.0

  • Terraform version: v1.5.5

  • Provider version(s): provider registry.terraform.io/hashicorp/aws v5.44.0

Reproduction Code [Required]

module "emr" {
  source        = "terraform-aws-modules/emr/aws"
  version       = "v2.0.0"
  name          = var.cluster_name
  release_label = var.release_label
  applications  = var.applications

  bootstrap_action = var.bootstrap_action

  vpc_id = data.terraform_remote_state.vpc.outputs.vpc_id
  log_uri = var.log_uri
  ebs_root_volume_size = var.ebs_root_volume_size
  step_concurrency_level = var.step_concurrency_level
  termination_protection = var.termination_protection
  ec2_attributes = {
    subnet_id = var.subnet_id
    key_name  = "airflow"
  }
  configurations_json = var.configurations_json
  iam_role_use_name_prefix = false
  iam_instance_profile_policies = {
    AmazonElasticMapReduceforEC2Role = "arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role"
    AWSGlueConsoleFullAccess = "arn:aws:iam::aws:policy/AWSGlueConsoleFullAccess"
    SecretManagerProductionReadWrite = aws_iam_policy.secret_manager_read_write.arn
    AppFlow = aws_iam_policy.app_flow.arn
  }

  # Master Group
  master_instance_group = {
    name           = "Master - 1"
    instance_count = var.master_instance_count
    instance_type  = var.master_instance_type
  }

  # Core Group
  core_instance_group = {
    name               = "Core - 1"
    instance_count     = var.core_instance_count
    instance_type      = var.core_instance_type
    autoscaling_policy = jsonencode({
    "Constraints" : {
      "MinCapacity" : 2,
      "MaxCapacity" : 8
    },
    "Rules" : [
      {
        "Action" : {
          "SimpleScalingPolicyConfiguration" : {
            "ScalingAdjustment" : 1,
            "CoolDown" : 1200,
            "AdjustmentType" : "CHANGE_IN_CAPACITY"
          }
        },
        "Trigger" : {
          "CloudWatchAlarmDefinition" : {
            "MetricName" : "ContainerPending",
            "ComparisonOperator" : "GREATER_THAN_OR_EQUAL",
            "Statistic" : "AVERAGE",
            "Period" : 300,
            "EvaluationPeriods" : 3,
            "Unit" : "COUNT",
            "Namespace" : "AWS/ElasticMapReduce",
            "Threshold" : 6
          }
        },
        "Name" : "prod_emr_core_scale_out"
      },
      {
        "Action" : {
          "SimpleScalingPolicyConfiguration" : {
            "ScalingAdjustment" : -1,
            "CoolDown" : 600,
            "AdjustmentType" : "CHANGE_IN_CAPACITY"
          }
        },
        "Trigger" : {
          "CloudWatchAlarmDefinition" : {
            "MetricName" : "ContainerPending",
            "ComparisonOperator" : "LESS_THAN_OR_EQUAL",
            "Statistic" : "AVERAGE",
            "Period" : 300,
            "EvaluationPeriods" : 8,
            "Unit" : "COUNT",
            "Namespace" : "AWS/ElasticMapReduce",
            "Threshold" : 5
          }
        },
        "Name" : "prod_emr_core_scale_in"
      }
    ]
  }
})
  }

    # Security Groups
  managed_security_group_use_name_prefix = false
  master_security_group_rules = [ ... ]
  slave_security_group_rules = [ ... ]
}

Steps to reproduce the behavior:

  1. Create an EMR cluster using the above code (add a variables.tf with some values)
  2. Note that the custom automatic scaling policies has the failed status

Expected behavior

The Service Role for EMR is able to assume the Autoscaling Role and there are no terraform drifts.

Actual behavior

The service Role for EMR is not able to assume the Autoscaling Role due to misconfigured trust-relationship for the Autoscaling Role, and I have a terraform drift since I had to manually change the trust relationship in the AWS Console.

Andrea-Gallicchio avatar Apr 16 '24 13:04 Andrea-Gallicchio