examples icon indicating copy to clipboard operation
examples copied to clipboard

Lambda was unable to decrypt the environment variables because KMS access was denied

Open mohitkale opened this issue 5 years ago • 57 comments

Dear Author,

For some strange reasons only the GET SINGLE TODO ITEM request is not working while all other APIs are working fine (i.e., LIST, CREATE, UPDATE, and DELETE).

I am getting this error, in the API Gateway console.

Reference Example: https://github.com/serverless/examples/tree/master/aws-node-rest-api-with-dynamodb

Endpoint response body before transformations: {"Message":"Lambda was unable to decrypt the environment variables because KMS access was denied. Please check the function's KMS key settings. KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.","Type":null}

I am using same ITEM ID in both GET and DELETE methods, the DELETE method works but the GET method throws an Internal Server Error (stack trace as mentioned above).

Please suggest.

mohitkale avatar Jul 13 '18 15:07 mohitkale

I came across the same error today in my own project. Like you, it seems only one of my functions is affected, and I'm not sure why.

tremby avatar Aug 29 '18 22:08 tremby

I'm having the same issue, did someone figure out a workaround?

andarilhoz avatar Sep 06 '18 14:09 andarilhoz

I had this issue and after some head banging found out it was due to deleting an IAM policy and creating using the same name, simply changing the IAM of the lambda to something else, saving and then changing back fixed it.

liampauling avatar Sep 11 '18 19:09 liampauling

I ran the command to remove everything serverless had deployed, then deployed again, and for some reason it was then OK. 😕

tremby avatar Sep 11 '18 22:09 tremby

I had the same issue. It was necessary to remove all lambda functions and they deploy them again.

Lasim avatar Sep 12 '18 11:09 Lasim

Same issue happened to me today with my own project using sls version 1.32.0. It's an unfortunate workaround, since removing and deploying results in brand new endpoints, which would be a problem for me in production.

jaybarts avatar Dec 05 '18 20:12 jaybarts

I've never seen this. Can this be reproduced reliably? If so, could you provide me with your serverless.yml so I can debug this?

dschep avatar Dec 05 '18 22:12 dschep

I've never seen this. Can this be reproduced reliably? If so, could you provide me with your serverless.yml so I can debug this?

@dschep I was able to reproduce it quite a few times today, but it seemed to take a few tries (of deploys & removes) before I got the same exact error. I created a repo with the serverless.yml as well as instructions on how to reproduce. I think it's related to a serverless deployment failing midway, which in my case was due to a duplicate name for a CloudWatch Event Rule. I'm sure any name conflict error would also cause the issue, but I included this particular case since it did the trick for reproducing the issue.

Link to the Repo: https://github.com/jaybarts/sls-kms-issue

Thank You for offering to take a look at this issue. Please let me know if you need anything else.

jaybarts avatar Dec 06 '18 20:12 jaybarts

Thanks for the dtails @jaybarts!! I'll take a look at this tomorrow or early next week.

dschep avatar Dec 06 '18 20:12 dschep

I had the same issue today, it is related to when you delete and re-deploy. Ive had some instances where I want to do a clean test of the entire stack.

PvanHengel avatar Dec 31 '18 02:12 PvanHengel

Was developing and deploying fine on one computer. About to travel so setup on a new laptop. Same code but just new serverless setup on a different computer, getting this error and couldn't pass it. The lambda complained was configured using the default encryption. I got back to the other PC I used and tried to deploy the same code, no problem. So I have two computers one I can deploy and the other (possibly running a newer version of serverless and other tools which cannot.

huangenyang avatar Jan 17 '19 06:01 huangenyang

I ran into this issue just now.

  • Nested stack (Api/Log)
  • Initial stack deployment failed due to hitting rate limit on Lambda creation
  • Redeployed and succeeded
  • A single lambda of 34 lambdas in that package has this issue

sverraest avatar Mar 17 '19 15:03 sverraest

I had the same issue and figured out the problem. AWS Doc said,

AWS Lambda authorizes your function to use the default KMS key through a user grant, which it adds when you assign the role to the function. If you delete the role and create a new role with the same name, you need to refresh the role's grant. Refresh the grant by re-assigning the role to the function

So, I just re-deploy function and it worked well.

hard-coders avatar May 02 '19 02:05 hard-coders

Experienced the same issue as well. Had to delete the lambda function manually and recreate using terraform to resolve it.

ctippur avatar May 07 '19 07:05 ctippur

This happens to me quite frequently, more so as the number of functions in my serverless service grows. Removing and subsequently re-deploying has an almost 50% chance of having this error pop up when I try to test my deployment now.

GCCreemars avatar May 09 '19 20:05 GCCreemars

My problem was caused due to the fact:

  1. I changed the user's key which is used on building new instances (the first key which gets placed into the instance to enable SSH-connection) without changing the corresponding KMS key policy in AWS
  2. I also had few orphaned account-IDs in key policy. I read from somewhere these might also cause failures.

When I added my AWS user account ARN to the list of allowed users under policy's decrypt action and removed orphaned user account IDs (orphaned due to the fact we deleted one AWS user, but corresponding user's account ID persisted in policy) then problems disappeared.

Fornacula avatar Jul 24 '19 08:07 Fornacula

Go to the Lamda console > Encryption Configuration > Restart the configuration. For example, change it to a customer master key and save and then again return it to default and save. This solved my problem.

Al-Jp avatar Oct 30 '19 07:10 Al-Jp

I've deployed my lambdas with serverless framework and I got this only for one function, but not for the others. All functions are using the same role. Manually changing role in AWS for the function with this issue, to some other random role, and back to the original role fixed the problem. If it helps the one that was not working was triggered by Http GET, the one that worked was triggered by Http POST

adimoraret avatar Nov 22 '19 06:11 adimoraret

I got the same problem that started when I changed from one custom KMS key for another. So once changed the custom KMS in the lambda, when I tried to update the lambda configuration with the AWS CLI command:

aws lambda update-function-configuration --function-name notifications-status-update-emitter --runtime nodejs10.x --handler handler.handler --timeout 60 --memory-size 256 --environment Variables={ENVIRONMENT=staging}

I got the following output

An error occurred (AccessDeniedException) when calling the UpdateFunctionConfiguration operation: Lambda was unable to configure access to your environment variables because KMS returned Access Denied. Please check your KMS permissions. KMS Exception: AccessDeniedException KMS Message: User: arn:aws:iam::xxxxxx:user/deploy is not authorized to perform: kms:CreateGrant on resource: arn:aws:kms:us-west-2:xxxxxx:key/xxxxxxxxxxxxxxxxxx

And that was pretty weird because I already have granted to the deploy user permissions to update the lambda configuration... I though that is so weird! So after some try a couple of times searching what could be a solution for it, I fixed it with the following:

  1. Modifying the encryption configuration to the default encryption

(default) aws/lambda

Screen Shot 2019-12-03 at 13 00

  1. And then, execute update the lambda configuration again
  2. Later enable again the encryption configuration with my custom KMS key
  3. Execute again the update the lambda configuration and it should work again

I think maybe this is a AWS bug?

cmardonespino avatar Dec 03 '19 16:12 cmardonespino

I have tested from AWS side, I am able to create the lambda function without any issue f018982d4db3:testlambda wafaas$ sls deploy -s dev Serverless: Packaging service... Serverless: Excluding development dependencies... Serverless: Uploading CloudFormation file to S3... Serverless: Uploading artifacts... Serverless: Uploading service service-name.zip file to S3 (1.84 KB)... Serverless: Validating template... Serverless: Updating Stack... Serverless: Checking Stack update progress... ................................. Serverless: Stack update finished... Service Information service: service-name stage: dev region: eu-west-1 stack: service-name-dev resources: 9 api keys: None endpoints: None functions: lambda1: service-name-dev-lambda1 lambda2: service-name-dev-lambda2 lambda3: service-name-dev-lambda3 layers: None Serverless: Run the "serverless" command to setup monitoring, troubleshooting and testing. f018982d4db3:testlambda wafaas$

looks like the issue from serverless side

here is the sample of my template

functions: lambda1: # Do Not Change This Lambda Name Without Update The manna-serverless-plugin !!! handler: handler.hello description: testing function integration: lambda resultTtlInSeconds: 0 type: request tags: LambdaName: lambda1 environment:
test: testdata ENVIRONMENT: lambda

wafaaSultan avatar Jan 09 '20 13:01 wafaaSultan

@dschep Is there any update on this? This happens to us fairly consistently when doing a remove followed shortly by a re-deploy. The issue usually resolves itself within 5-10 minutes. Is there anything we can add to our deployment to speed that up?

tjcobb avatar Jan 14 '20 19:01 tjcobb

I found a workaround to fix this issue by adding a role direct to your template "serverless.yml" with lambda full access as following; functions: lambda1: # Do Not Change This Lambda Name Without Update The manna-serverless-plugin !!! handler: handler.hello description: testing function integration: lambda role : arn:aws:iam::xxxxxxxxxxxx:role/Lambda resultTtlInSeconds: 0 type: request tags: LambdaName: lambda1 environment: test: testdata ENVIRONMENT: lambda

I have tested from myside and it's working

wafaaSultan avatar Jan 17 '20 13:01 wafaaSultan

I had this issue and after some head banging found out it was due to deleting an IAM policy and creating using the same name, simply changing the IAM of the lambda to something else, saving and then changing back fixed it.

I believe this is because the lambda references the identifier of the IAM role to use, not the ARN of the IAM role. Read more about identifiers here : https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html#identifiers-unique-ids

ajoga avatar Feb 22 '20 08:02 ajoga

AWS, multi-billion dollar company, compute in the cloud, remove the need for servers. -- have you tried turning it off and on again?

sh4des avatar Apr 27 '20 05:04 sh4des

Still seeing this issue intermittently. Redeployment does appear to resolve the issue, but is far from an ideal solution. What is worse is that Lambda when invoked, still appears to return an HTTP status code 200, or 202 (depending on a sync or async invocation) which makes it rather hard to detect this error programmatically.

campellcl avatar Jun 08 '20 16:06 campellcl

Immediate redeployment (sls deploy right after sls remove) does not help, I usually wait at least a minute or two. And then it still might not help! At this point I have to scrap and restart every second dev deployment. Even though it irritates me quite a bit during development, to my surprise, our prod deployments have not been affected by it yet, probably because we don't deploy to prod as often as I deploy on dev to test things out.

c4lm avatar Jul 17 '20 09:07 c4lm

I just had the same problem and as people mention here: it is related with redeployment using the same role name.

I did solved it by: IAM -> Roles -> $YourRoleNameHere -> Revoke Sessions -> Revoke active sessions

I hope it helps.

mogaal avatar Aug 06 '20 11:08 mogaal

Seems to be the same issue:

https://github.com/terraform-providers/terraform-provider-aws/issues/6352

I also just ran into it. Revoking active sessions didn't solve it.

joyofdata avatar Aug 27 '20 14:08 joyofdata

I just went to Lambda Console -> my lambda -> Environment Variables section -> Edit -> DONT DO ANY CHANGE -> Click on 'Save'. And, it started to work!

ramgrandhi avatar Sep 16 '20 11:09 ramgrandhi

I've deployed my lambdas with serverless framework and I got this only for one function, but not for the others. All functions are using the same role. Manually changing role in AWS for the function with this issue, to some other random role, and back to the original role fixed the problem. If it helps the one that was not working was triggered by Http GET, the one that worked was triggered by Http POST

This solved my problem! Thanks

Yongshuai-Liu avatar Sep 16 '20 16:09 Yongshuai-Liu

We are facing the same issue with one of our application, wonder why it is happening to only one lambda. Anyone recently fixed the issue?

prashanthtiramareddi avatar Oct 31 '20 14:10 prashanthtiramareddi

I think this happens to me when I have the AWS Lambda GUI open in a browser tab on one of the Lambdas in the service when I redeploy. The error seems to occur less frequently when closing all open Lambda tabs before redeploying.

GCCreemars avatar Nov 02 '20 14:11 GCCreemars

@liampauling your suggestion is still working!!! thanks

marcelomanchester avatar Nov 17 '20 16:11 marcelomanchester

If you are Deleting IAM Role and recreating again it causes this KMS issue when running Lambda

Resolution: Do not delete IAM Role when redeploying. You can delete all policies under role and recreate all policies

I did the same in my AzureDevOps AWS CLI script to resolve this issue

gurunathchoukekar9 avatar Dec 03 '20 14:12 gurunathchoukekar9

Quick fix provided by @ramgrandhi above (go to Lambda UI -> edit Lambda config (with no tweaks whatsoever) -> save) solves the issue for me.

Any idea why does it occur and when? I am not able to reproduce it. Duh.

tomaszdudek7 avatar Dec 22 '20 07:12 tomaszdudek7

We had this issue if our Role was unchanged between deployments and did a serverless remove && serverless deploy. We solved it by removing the name from the Role within serverless.yml. With the name omitted Serverless generates a unique name, for each deployment.

ExecutionRole:
    Type: AWS::IAM::Role
    Properties:
        RoleName: my-execution-role-name        // Remove this line
        AssumeRolePolicyDocument:
        ...

david-mcqueen avatar Jan 26 '21 17:01 david-mcqueen

I had the same issue. It was necessary to remove all lambda functions and they deploy them again.

Well... after redeployment my function worked well but another one failed with this error...

parencik avatar Feb 10 '21 11:02 parencik

Seeing the same every now and then. A real heart-breaker, and creates a huge mess when you're working on something that is not really ideal to "rip and replace".

mikaelvesavuori avatar Feb 13 '21 21:02 mikaelvesavuori

This is frustrating, I've been working with a CloudFormation deployment all day with one Lambda function, building and tearing down repeatedly to do some testing, and now all of a sudden I get this error message. Redeploying the Lambda solved the issue, which is troubling, but at least I'm back in business. This is far from ideal for a robust CI/CD process, but I'm not dealing with a production-ready system at the moment, so for my situation this solution is fine for now.

dspenard avatar Feb 16 '21 21:02 dspenard

Yup, re-deploying fixes the problem. It's that simple.

createdbykartik avatar Mar 01 '21 05:03 createdbykartik

It may sound simple, but having your CI/CD randomly fail now and then(well, even worse than fail - deploy something that does not work) is awful. And so is telling your teammates "Well, this rock and solid framework can sometimes render your deployment unusable. Just try deploying again when it does!".

I'd love sls team tracking down and fixing this bug.

tomaszdudek7 avatar Mar 01 '21 09:03 tomaszdudek7

Having to custom code post-IAC deployment tests to automatically redeploy portions of it to get around this bug really sucks.

sambonator1 avatar Mar 13 '21 09:03 sambonator1

Almost 3 years later...

sverraest avatar Mar 13 '21 09:03 sverraest

Took me 3 days to track this issue down!! Perhaps, for a mitigation step, the CF template can be analyzed for renaming changes which cause this issue and then if present perform a redeployment of the APIs. This can be externalized via the Serverless.yml to control when it should be triggered. I'll hash up a draft PR for this when I have a few cycles.

drexler avatar Mar 17 '21 03:03 drexler

I had the same issue and figured out the problem. AWS Doc said,

AWS Lambda authorizes your function to use the default KMS key through a user grant, which it adds when you assign the role to the function. If you delete the role and create a new role with the same name, you need to refresh the role's grant. Refresh the grant by re-assigning the role to the function

So, I just re-deploy function and it worked well.

This worked for me, except I am using AWS Amplify. Thanks!

fkunecke avatar Mar 20 '21 07:03 fkunecke

Was also able to fix this by just changing the execution role of the lambda function in the Configure tab to anything else, and then back to the role it needs. Seems to re-apply the role to the lambda and it runs as expected.

Re-deploying the entire lambda itself also works, but I found this to be an easier and quicker solution :-)

jweilhammer avatar Apr 29 '21 15:04 jweilhammer

We saw these errors recently too: "Lambda was unable to decrypt the environment variables because KMS access was denied. Please check the function's KMS key settings. KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access." I resolved this error ^ by assigning our Lambda function to a different Execution role and then reassigning it to the correct Execution role.

nathant727 avatar Aug 12 '21 21:08 nathant727

After hitting this again, believe this error is because of the IAM role session time. Think that if the role is changed, and the lambda tries to execute again within a window of its max session time, then this error will occur.

Potentially waiting the duration for the old role's session to expire would fix as well, and explains why switching the role is fixing it (lambda retrieving new session with updated role)

jweilhammer avatar Aug 13 '21 13:08 jweilhammer

We ran into this issue a couple of days back. Our lambdas have been deployed using Terraform and the lambdas are meant to be triggered using event bridge events. But the lambdas were not recognizing the events since event bridge was not added as a trigger to the lambdas. I suspect it might be because the terraform scripts for events were executed before the lambdas were deployed. Once the triggers were set (had to edit the rules and save them manually), got the below error when we tried to test the lambdas.

"Lambda was unable to decrypt the environment variables because KMS access was denied. Please check the function's KMS key settings. KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access."

Setting the IAM role to a different one, saving, setting it back to the original and saving it again got the lambdas to work.

dithos211 avatar Sep 02 '21 16:09 dithos211

Thanks @dithos211 , those steps worked for me perfectly. Ta very

steven-hunt-devopsgroup avatar Jan 19 '22 11:01 steven-hunt-devopsgroup

manually changing lambda role to something else on the web portal and then back to the original role fixed the thing

DenysVyskrebetsTR avatar Apr 07 '22 15:04 DenysVyskrebetsTR