aws-cdk icon indicating copy to clipboard operation
aws-cdk copied to clipboard

feat(lambda): allow to delete log group of a lambda function when it is removed

Open tmokmss opened this issue 2 years ago • 5 comments

closes #21804

Now we can set autoDeleteLogGroup: true to delete an associated Lambda log automatically.

new lambda.Function(stack, 'AutoDeleteLogGroup', {
  code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
  handler: 'index.handler',
  runtime: lambda.Runtime.NODEJS_14_X,
  autoDeleteLogGroup: true,
});

Also I fixed an issue of logRetention when there are multiple LogRetention resources. To be short the backing Lambda function is shared with all the LogRetention (similar to SingletonLambda) so we need to add an IAM policy every time when a LogRetention with RemovalPolicy=destory is created.

Because without this fix the integration tests added in this PR won't work I'm packing those two fixes in this PR. Please let me know if I should submit two separate PRs :)

Consideration on property removal

If a user remove autoDeleteLogGroup property as below, the logRetention resource will be removed from a synthesized template, and it will result in the removal of log group itself. This implicit behavior should be clearly warned in the doc.

new lambda.Function(stack, 'AutoDeleteLogGroup', {
  code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
  handler: 'index.handler',
  runtime: lambda.Runtime.NODEJS_14_X,
-  autoDeleteLogGroup: true,
});

There was an issue regarding s3 Bucket's autoDeleteObjects feature: #16603. We might need to implement a similar approach. It might be difficult since we are not directly managing LogGroup in CFn though.


All Submissions:

Adding new Unconventional Dependencies:

  • [ ] This PR adds new unconventional dependencies following the process described here

New Features

  • [X] Have you added the new feature to an integration test?
    • [X] Did you use yarn integ to deploy the infrastructure and generate the snapshot (i.e. yarn integ without --dry-run)?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

tmokmss avatar Aug 30 '22 04:08 tmokmss

gitpod-io[bot] avatar Aug 30 '22 04:08 gitpod-io[bot]

Hi @TheRealAmazonKendra I really appreciate your review and effort you expended on my PR! So let me summarize the discussion so far.

I assume there are currently three possible options to be considered:

// #1
// Pros: Less typing. Directly expressing developer's intent.
// Cons: Due to more abstraction, the API becomes different from LogRetention construct. 
new lambda.Function(stack, 'AutoDeleteLogGroup', {
    code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
    handler: 'index.handler',
    runtime: lambda.Runtime.NODEJS_14_X,
    autoDeleteLogGroup: true,
});

// #2
// Pros: Similar API to LogRetention construct.
// Cons: * Requires more typing. We have to import aws-cdk-lib.RemovalPolicy, and the property name seems a little redundant.
//       * We can also pass RemovalPolicy.SNAPSHOT although it means nothing here, which is unnecessarily confusing.
//       * It may look like removal policy of Lambda function itself, which is also misleading.
new lambda.Function(stack, 'AutoDeleteLogGroup', {
    code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
    handler: 'index.handler',
    runtime: lambda.Runtime.NODEJS_14_X,
    logRetentionRemovalPolicy: RemovalPolicy.DESTROY,
});

// #3
// If we go with this approach we need much refactor work. I guess we can ignore this approach for now.
new lambda.Function(stack, 'AutoDeleteLogGroup', {
    code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
    handler: 'index.handler',
    runtime: lambda.Runtime.NODEJS_14_X,
    logRetentionProps: {
         // also move other logRetention* props here
        removalPolicy: RemovalPolicy.DESTROY,
    },
});

I agree with API consistency is important, and I somewhat agree with that we need to be consistent with props of internal constructs. But LogRetentionProps and FunctionPros are in different abstraction layers. Developers do not necessarily have to know about implementation detail of LogRetention when they want to automatically delete log groups. To me using RemovalPolicy here seems kind of leaky abstraction where we can achieve simpler API.

What we also need to be care about is the consistency between constructs that are using LogRetention. I did some study about this and found there are now five constructs using LogRetention: AppSync, ChatBot, DocumentDB, RDS, and Lambda. None of them currently have implemented log removal feature yet, so I believe we don’t have to care about consistency in terms of this point, and we can design the API from scratch. Also as you can see, they do not currently have consistent API with each other at all anyway:(

To be honest after the release of this feature I’ll be specifying this option every time I create a Lambda function. And that's why I’m inclined to keep this API as simple as possible... I'd like to avoid to specify cdk.RemovalPolicy repetitively where a single boolean is enough. As a side note, I asked my colleagues about this today and most of them prefer #1 over #2 due to its simplicity.

That being said I'm willing to follow your final decision. Considering all of the above, how do you think about #1 and #2? And sorry for writing such a long message with my poor English. Let me know if there are any sentences that does not make sense! Thanks.

tmokmss avatar Sep 01 '22 14:09 tmokmss

@TheRealAmazonKendra According to the design guidelines, flat props structure is prefferred over nested structure. (doc). I guess it partially explains why most of the log properties are currently designed as such instead of #3 approach.

If we are to fully replace the existing contract design, and if the contract consistency is a top priority, how about adding a new interface to set log retention?

e.g.

interface ILoggable {
    setLogRetention(props: LogRetentionProps): LogRetention;
    logGroup: ILogGroup;
}

class Function implements ILoggable {
    setLogRetention(props: LogRetentionProps) {
        // ...
    }
}

const func = new lambda.Function(stack, 'AutoDeleteLogGroup', {
    code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
    handler: 'index.handler',
    runtime: lambda.Runtime.NODEJS_14_X,
});

func.setLogRetention({ removalPolicy: RemovalPolicy.DESTROY })

As long as they implement this interface, the API will be ensured to be consistent between constructs.

However I'm not sure if the above approach will work well with some constructs such as AppSync.GraphqlApi, which seemingly want to provide optimized experience by defining their own props (e.g. LogConfig). I guess we also need to discuss with people who are developing the constructs?

Also I think we should, ideally, define a dedicated props interface for setLogRetention instead of re-using the existing LogRetentionProps, because setting logGroupName here does not make sense at all even if logGroupName is optional. The log group name should always be computed from the resource properties (e.g. function name.) So we need something like Omit<LogRetentionProps, 'logGroupName'> instead (of course we cannot use generics with jsii so it should be manually defined.)

tmokmss avatar Sep 05 '22 05:09 tmokmss

Hi @TheRealAmazonKendra As I noted on the PR description, I found there will be an issue similar to #16603 in LogRetention construct (i.e. users may remove a log group unintentionally.) Is it something we must resolve before adding this PR's feature? If so I'll work on it before this PR.

tmokmss avatar Sep 07 '22 14:09 tmokmss

This PR has been in the CHANGES REQUESTED state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

aws-cdk-automation avatar Sep 24 '22 00:09 aws-cdk-automation

@tmokmss Please could you advise on the current state of this PR (e.g. active, blocked or abandoned)? It looks very useful to what my team is currently trying to achieve if it can be progressed/merged.

stec00 avatar Oct 12 '22 12:10 stec00

Hi @stec00 From my perspective there are still two remaining problems for this issue:

  1. what API will look like? (ref)
  2. a log group can be accidentally deleted, as described in the PR body; do we need to handle this issue?

I'm waiting comments from @TheRealAmazonKendra, and your comments on API design is also welcome.

tmokmss avatar Oct 12 '22 12:10 tmokmss

Thanks @tmokmss . Not sure I'm qualified to comment on all the detail here as it looks like it requires knowledge of patterns elsewhere that I don't currently have. But am very keen for it to be progressed as we currently have a lot of old log groups remaining from Lambdas whose CloudFormation stacks have long since been deleted. (We can of course clean these up manually but the problem will keep resurfacing). @TheRealAmazonKendra are you able to help steer this please - or if not is there someone else who could re-review?

stec00 avatar Oct 13 '22 13:10 stec00

This seems to be stagnating now. Would be a shame for the work done so far to go to waste, as it's providing useful missing functionality - in my case I have a stack that's in production and need to change the template so that future Lambda log groups get auto-deleted when their stacks are deleted. CC: @TheRealAmazonKendra @tmokmss

stec00 avatar Oct 20 '22 08:10 stec00

@stec00 A workaround would be something like this (not ideal of course):

// In bin/xxxx.ts
const app = new cdk.App();
new YourStack(app, "YourStack");

class LogRemover implements IAspect {
  public visit(construct: IConstruct): void {
    if (construct.node.id == 'LogRetentionaae0aa3c5b4d4f87b02d85b201efdd8a') {
      const role = construct.node.findChild('ServiceRole') as Role;
      // Add required IAM policy to the custom resource handler for log retention
      role.addToPolicy(
        new PolicyStatement({
          actions: ['logs:DeleteLogGroup'],
          resources: ['*'],
        }),
      );
    } else if (construct instanceof LogRetention) {
      // set removalPolicy to all of the log retention resources
      const custom = construct.node.findChild("Resource") as CfnResource;
      custom.addOverride("Properties.RemovalPolicy", "destroy");
    }
  }
}

// register this aspect to your CDK app
Aspects.of(app).add(new LogRemover());

You also need to add log retention to each of your lambda function:

      const handler = new lambda.Function(this, 'Handler', {
        // ...
        // you need this
        logRetention: RetentionDays.INFINITE,
      });

You should be super careful not to remove required log groups though. For example, once you deploy the above and you just removed logRetention: RetentionDays.INFINITE not Lambda function itself, your log group will be deleted as described in the body of this PR.

tmokmss avatar Oct 20 '22 09:10 tmokmss

This PR has been in the MERGE CONFLICTS state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

aws-cdk-automation avatar Nov 02 '22 16:11 aws-cdk-automation

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: fff7bcd468b64ac9348d6d047e44efb41b6ba113
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

aws-cdk-automation avatar Nov 03 '22 03:11 aws-cdk-automation

Ping - We're hoping to avoid implementing a hacky workaround for the most obvious non-working solution.

l0b0 avatar Nov 21 '22 20:11 l0b0

This PR has been in the MERGE CONFLICTS state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

aws-cdk-automation avatar Dec 06 '22 00:12 aws-cdk-automation

This PR has been in the MERGE CONFLICTS state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

aws-cdk-automation avatar Apr 01 '23 00:04 aws-cdk-automation

This PR has been in the MERGE CONFLICTS state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

aws-cdk-automation avatar Apr 26 '23 00:04 aws-cdk-automation

This PR cannot be merged because it has conflicts. Please resolve them. The PR will be considered stale and closed if it remains in an unmergeable state.

aws-cdk-automation avatar May 13 '23 00:05 aws-cdk-automation

@tmokmss Apologies for the delay. Let's wrap up the open questions:

  1. what API will look like? (ref)

The API for logRetention is a huge mess and desperately needs to be refactored. In the meantime #2 is the way to go as per the design docs. It has a flat structure and uses RemovalPolicy. S3 Bucket actually is the only (documented) special case, so it kind of deserves a different property. Although arguably we could have created a RemovalPolicy.DESTROY_WITH_CONTENTS at the time.

  1. a log group can be accidentally deleted, as described in the PR body; do we need to handle this issue?

Yes, but that's already implemented in the CR anyway.

I also have one more thing I'd like to clarify:

If a user remove autoDeleteLogGroup property as below, the logRetention resource will be removed from a synthesized template, and it will result in the removal of log group itself. This implicit behavior should be clearly warned in the doc.

I don't see how this is happening with the code you have on the PR. Is this still the case, if so can you help me understand how?

mrgrain avatar Jul 31 '23 14:07 mrgrain

Hi @mrgrain, thanks for the review!

I don't see how this is happening with the code you have on the PR. Is this still the case, if so can you help me understand how?

Sure. Let's say we have the following template deployed.

new lambda.Function(stack, 'Function', {
  code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
  handler: 'index.handler',
  runtime: lambda.Runtime.NODEJS_14_X,
  autoDeleteLogGroup: true,
});

The CFn template for this code looks like this:

  "FunctionF74A7E0D": {
   "Type": "AWS::Lambda::Function",
   "Properties": { ...
   }
  },
  "FunctionLogRetentionAC501FFB": {
   "Type": "Custom::LogRetention",
   "Properties": {
    ...
    "RemovalPolicy": "destroy"
   }
  }

Now if we remove the autoDeleteLogGroup property:

new lambda.Function(stack, 'Function', {
  code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
  handler: 'index.handler',
  runtime: lambda.Runtime.NODEJS_14_X,
-   autoDeleteLogGroup: true,
});

the CFn template will look like this:

  "FunctionF74A7E0D": {
   "Type": "AWS::Lambda::Function",
   "Properties": { ...
   }
  },
-  "FunctionLogRetentionAC501FFB": {
-   "Type": "Custom::LogRetention",
-   "Properties": {
-    ...
-    "RemovalPolicy": "destroy"
-   }
-  }

Note that only the log retention resource is deleted, and the function remains as-is.

Here is where the unexpected behavior comes in. When a Custom::LogRetention with "RemovalPolicy": "destroy" is removed, it will always remove the log group, even though the log group is still used by the function.

https://github.com/aws/aws-cdk/blob/6c588da2c6b1c2f5a53fb9151c84c7298ce2427e/packages/aws-cdk-lib/aws-logs/lib/log-retention-provider/index.ts#L156-L160

Actually there is no way for the custom resource to know if the corresponding function is removed or not. I thought we could query the Lambda service if the function exists or not, but that is also difficult, because of the dependency between a LogRetention and a function. A LogRetention depends on a corresponding function (to get the function name), so it must be deleted before the function.

https://github.com/aws/aws-cdk/blob/7c7697e4d06f1d0be73da50509a4c73545a1ea87/packages/aws-cdk-lib/aws-lambda/lib/function.ts#L868-L869

I still has not found a good solution for this problem, but any suggestion is welcome! :) I think the easiest approach is just to accept this behavior as expected, and document details about this.

tmokmss avatar Jul 31 '23 19:07 tmokmss

@tmokmss Thanks for the explanation. I understand the problem now. I think the issue comes down to creating a false equality between the LogRetention custom resource and the LogGroup that the Lambda function uses. Before #21113 the CR did not concern itself with a the lifecycle of the log group. It was a simple fix to overwrite the retention. Actually https://github.com/aws/aws-cdk/pull/26049 has a similar issue in this regard.

Not quite sure what the solution would be for all of these... 🤔

mrgrain avatar Aug 01 '23 10:08 mrgrain

@mrgrain Another solution would be to always define a LogRetention custom resource regardless of a function's properties. Then even if we remove those properties, the log retention resource and the log group will still be retained.

I hesitated to do this because it would require updates to many many existing integ tests. But If other features are also facing this problem, it might be reasonable to change the current behavior.

tmokmss avatar Aug 01 '23 13:08 tmokmss

Been thinking about this all morning. It's fairly easy to implement a variation of the fix that S3's autoDeleteObjects has. But only if the parent resource (here: a Lambda Function) is known. It get's complicated for the general case because each resource uses a slightly different tagging interface. 😕

mrgrain avatar Aug 01 '23 14:08 mrgrain

@tmokmss Thanks again for this PR. I'm drafting this for now. We have to solve the deletion issue first. Same as here: https://github.com/aws/aws-cdk/pull/26049

I have some ideas for it though.

mrgrain avatar Aug 02 '23 13:08 mrgrain

@mrgrain ok, thanks! I'm looking forward to that:)

tmokmss avatar Aug 02 '23 14:08 tmokmss

I'm planning to create a generic DependentLogGroup (or similar) that actually supports all log group properties. To avoid the deletion of the log group, we need to tag the parent (e.g. Lambda Function) with a special tag. The only blocker is that it's not really possible to read tags from generic parent resource. I think we will have to configure them manually.

mrgrain avatar Aug 02 '23 14:08 mrgrain

Nice! I just found issue #16756 and the discussions there appear to cover most related considerations. After reading them, I'm now convinced that tagging a parent resource is (maybe only) a viable solution. (I didn't know the special ordering of tags.)

Looking up the current deployed template (reverted in d220b949) is interesting but I guess it will also requires special handling for each parent resource type anyway.

tmokmss avatar Aug 02 '23 15:08 tmokmss

This PR has been in the MERGE CONFLICTS state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

aws-cdk-automation avatar Aug 04 '23 00:08 aws-cdk-automation

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 5d74c13115c31ccde9f7d291f4fe8d43f3628a82
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

aws-cdk-automation avatar Aug 04 '23 18:08 aws-cdk-automation