terraform-provider-aws icon indicating copy to clipboard operation
terraform-provider-aws copied to clipboard

Unable to invoke Lambda with environment variables due to KMS AccessDeniedException

Open jveldboom opened this issue 6 years ago • 12 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

  • Terraform v0.11.10
  • provider.aws v1.42.0

Affected Resource(s)

  • aws_lambda_function

Terraform Configuration Files

resource "aws_iam_role" "myrole" {
  name = "terraform-kms-test"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF
}

resource "aws_iam_role_policy_attachment" "basic_exec" {
  role       = "${aws_iam_role.myrole.name}"
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

resource "aws_lambda_function" "myfunction" {
  filename         = "build.zip"
  function_name    = "terraform-kms-test"
  role             = "${aws_iam_role.myrole.arn}"
  handler          = "index.handler"
  source_code_hash = "${base64sha256(file("build.zip"))}"
  runtime          = "nodejs8.10"

  environment {
    variables {
      MY_CONFIG = "config value"
    }
  }
}

Expected Behavior

Function using environment variables should be invocable after it's role name is changed.

Actual Behavior

  • On initial deployment, the function is able to be invoked without any errors
  • But if you change the IAM role name and rerun terraform apply, invoking the function returns the following error:

Calling the invoke API action failed with this message: Lambda was unable to decrypt the environment variables because KMS access was denied. Please check the function's KMS key settings. KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

Steps to Reproduce

  1. terraform apply
  2. Change aws_iam_role name to something different
  3. terraform apply

References

  • Seems to be a related issue #4633

jveldboom avatar Nov 04 '18 05:11 jveldboom

I'm also experiencing this issue and looking in CloudTrail, I do see KMS CreateGrant calls being made. It's not clear why this is being made as in this case, no KMS keys are being specified for encrypting environment variables.

Part of the issue is that it appears to be accessing a KMS key that no longer exists as that key is also removed by Terraform.

Here is the example:

{
    "requestParameters": {
      "operations": [
        "Decrypt",
        "RetireGrant"
      ],
      "granteePrincipal": "arn:aws:sts::ACCOUNT:assumed-role/LAMBDA-IAM-ROLE/NAME-OF-THE-LAMBDA-FUNCTION",
      "keyId": "arn:aws:kms:REGION:ACCOUNT:key/SOME-KEY-ID",
      "constraints": {
        "encryptionContextEquals": {
          "aws:lambda:FunctionArn": "arn:aws:lambda:REGION:ACCOUNT:function:NAME-OF-THE-LAMBDA-FUNCTION"
        }
      },
      "retiringPrincipal": "arn:aws:sts::ACCOUNT:assumed-role/LAMBDA-IAM-ROLE/NAME-OF-THE-LAMBDA-FUNCTION"
    },
    "eventType": "AwsApiCall",
    "responseElements": {
      "grantId": "SOME-GRANT-ID"
    },
    "awsRegion": "REGION",
    "eventName": "CreateGrant",
    "readOnly": false,
    "eventSource": "kms",
    "userAgent": "lambda.amazonaws.com",
    "sourceIPAddress": "lambda.amazonaws.com",
    "resources": [
      {
        "type": "AWS::KMS::Key",
        "ARN": "arn:aws:kms:REGION:ACCOUNT:key/SOME-KEY-ID",
        "accountId": "ACCOUNT"
      }
    ],
    "recipientAccountId": "ACCOUNT"
  }
}

mikegrima avatar Nov 22 '18 18:11 mikegrima

Any news about this issue ? If no one has looked at it I could probably try my luck at fixing it.

nekonyuu avatar Mar 05 '19 10:03 nekonyuu

I got this same problem on my production app. According to this similar issue, a redeploy should be enough. I'm going to try this.

mateusfccp avatar May 02 '19 14:05 mateusfccp

I just hit this issue also. Had to do a terraform destroy followed by a terraform apply to resolve.

gawbul avatar Aug 22 '19 13:08 gawbul

Hi folks 👋 Sorry you are running into this strange behavior.

The maintainers here are not sure what the right action should be here given the vastly different experiences folks are having. Is documenting the potential for this odd behavior in the role argument for the aws_lambda_function resource documentation enough? We also could automatically trigger a code publish if role is updated. The caveat there is that we could only publish the function again if the practitioner enabled the publish argument.

Suggestions welcome, thanks!

bflad avatar Nov 12 '19 20:11 bflad

Thank you for taking the time to consider this.

I think updating the documentation to explicitly mention this issue might be good. Would it also be possible to recommend including the IAM role name in the source_code_hash? I'm not sure if a function update is enough to fix the issue though, but something like this. Is that what you mean by publish?

source_code_hash = "${base64sha256(file("build.zip"))}-${aws_iam_role.myrole.name}"

jveldboom avatar Nov 15 '19 13:11 jveldboom

Seems to be the same issue:

https://github.com/serverless/examples/issues/279

I also just ran into it. I'm using TF 0.13.1. Destroying and applying doesn't solve it.

That worked: https://github.com/terraform-providers/terraform-provider-aws/issues/6352#issuecomment-554359665

This is the code that causes the issue when testing the lambda. The problem started after introducing the Logging-policies and associating them with the Lambda's role.

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
    }
  }
}

provider "aws" {
  profile = "default"
  region  = "us-west-2"
}

resource "aws_s3_bucket" "project_project_bucket" {
  bucket = "project-project-bucket-g3dg6gf4fddk"
  acl    = "private"
}

resource "aws_iam_role" "lambda_project_etl" {
  name = "iam_role_project_etl"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}

# See also the following AWS managed policy: AWSLambdaBasicExecutionRole
resource "aws_iam_policy" "lambda_project_etl" {
  name        = "lambda_logging"
  path        = "/"
  description = "IAM policy for logging from a lambda"

  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*",
      "Effect": "Allow"
    }
  ]
}
EOF
}

resource "aws_iam_role_policy_attachment" "lambda_logs" {
  role       = aws_iam_role.lambda_project_etl.name
  policy_arn = aws_iam_policy.lambda_project_etl.arn
}

resource "aws_lambda_layer_version" "lambda_layer_project_etl" {
  filename   = "dist/lambda_layer_project_etl.zip"
  layer_name = "lambda_layer_project_etl"

  compatible_runtimes = ["python3.8"]
}

resource "aws_lambda_function" "lambda_project_etl" {
  filename      = "dist/lambda_project_etl.zip"
  function_name = "lambda_import_main_stories_by_day"
  role          = aws_iam_role.lambda_project_etl.arn
  handler       = "lambda_import_main_stories_by_day.main"

  source_code_hash = filebase64sha256("dist/lambda_project_etl.zip")

  runtime = "python3.8"

  layers = [aws_lambda_layer_version.lambda_layer_project_etl.arn]

  environment {
    variables = {
      foo = "bar"
    }
  }
}

joyofdata avatar Aug 27 '20 14:08 joyofdata

Seeing something similar, trying execute a aws lambda function:

"Calling the invoke API action failed with this message: Lambda was unable to decrypt the environment variables because KMS access was denied. Please check the function's KMS key settings. KMS Exception: UnrecognizedClientExceptionKMS Message: The security token included in the request is invalid."

rajinder-yadav avatar Feb 14 '21 21:02 rajinder-yadav

Would this not be as simple as adding the role for the lambda to the depends_on attribute in the Lambda, so you make sure that the Role is created before the lambda?

LorneCurrie avatar Jul 01 '21 12:07 LorneCurrie

had the same issue here. performing taint on the problem lambda resource and replacing/recreating it via apply seems to have solved the issue.

Difficult to find that this was a problem initially...

Initially I updated an unrelated resource, where the lambda related policies needed to be replaced/recreated.

When retesting after the initial update, I got the following cloudfront error:

500 {"message":null}
'X-Cache': 'Error from cloudfront'

The lambda function was showing errors, but I didnt' get any log output that would help debug the issue.

monkut avatar Nov 12 '21 04:11 monkut

Hi, there is a second workaround to change lambda role to different one and go back to the original lambda role (I guess AWS update something behind the scene). More info: https://github.com/serverless/examples/issues/279#issuecomment-420387109

quercusilvam avatar Nov 12 '21 08:11 quercusilvam

It has been over a year since anyone commented on this issue. I will be working to repro this and closing it if AWS and/or AWS provider changes have fixed the problem in the interim. Please let us know if you continue to face problems with this!

This is a great explanation from Paul Allen on the problem:

When you provide environment variables to a Lambda function, they're encrypted using a KMS key. Either a customer-managed key that you provide or an AWS managed default key (with the alias aws/lambda). When environment variables are first defined, if the default key is used then Lambda creates a grant on that key letting the execution role use it for decrypting the environment variables.

But, if that role is deleted and then re-created, the grant is no longer valid! This is the same as other resource-based policies when the principal is removed but this is special because we never actually explicitly created that grant ourselves. This means the function will start failing for no apparent reason.

YakDriver avatar Feb 23 '23 16:02 YakDriver

I was able to reproduce this problem with this configuration:

data "aws_partition" "current" {}

resource "aws_iam_role" "test" {
  name                = "roletna"
  managed_policy_arns = ["arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"]

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole",
      Principal = {
        Service = "lambda.${data.aws_partition.current.dns_suffix}",
      }
      Effect = "Allow"
    }]
  })
}

data "archive_file" "lambdazip" {
  type                    = "zip"
  output_path             = "lambda.zip"
  source_content          = "def handler(event, context):\n\tpass\n"
  source_content_filename = "lambda.zip"
}

resource "aws_lambda_function" "test" {
  function_name = "dicvojid"
  role          = aws_iam_role.test.arn
  handler       = "index.handler"
  runtime       = "python3.9"
  filename      = data.archive_file.lambdazip.output_path
  environment {
    variables = {
      foo = "bar"
    }
  }
}

Check function, then delete, and recreate role:

% aws lambda invoke \
> --function-name dicvojid \
> outfile
{
    "StatusCode": 200,
    "FunctionError": "Unhandled",
    "ExecutedVersion": "$LATEST"
}
% aws iam detach-role-policy \
--role-name roletna \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
% aws iam delete-role --role-name roletna
% terraform apply
% aws lambda invoke \
--function-name dicvojid \
outfile

An error occurred (KMSAccessDeniedException) when calling the Invoke operation (reached max retries: 2): Lambda was unable to decrypt the environment variables because KMS access was denied. Please check the function's KMS key settings. KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

YakDriver avatar Mar 07 '23 23:03 YakDriver

We will not fix this issue except with documentation updates. We won't fix this with provider code changes for these reasons:

  1. The resources are acting as expected, managing what they are supposed to manage.
  2. AWS Lambda manages a grant on the KMS key to the function's IAM role that Terraform does not directly manage.
  3. The invocation error arises in the aws_lambda_invocation resource or data source but seamlessly fixing the problem would require performing management on a Lambda function, which should taken place in the aws_lambda_function resource.
  4. This should not typically be an on-going issue practitioners run into as a normal part of operations but something that occurs when the IAM role is inadvertently or mistakenly recreated.
  5. There are painless fixes to the problem: reassigning the function's role to another role and back to the recreated role, or tainting and recreating the function.

Thank you for your time and input on this! We apologize for the delay in clearing this up. Look for documentation additions.

YakDriver avatar Mar 08 '23 00:03 YakDriver

This functionality has been released in v4.58.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

github-actions[bot] avatar Mar 10 '23 12:03 github-actions[bot]

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Apr 11 '23 02:04 github-actions[bot]