cdk-eks-blueprints icon indicating copy to clipboard operation
cdk-eks-blueprints copied to clipboard

Allow lambda loggroup retention settings

Open muckelba opened this issue 11 months ago • 1 comments

Describe the feature

I recently noticed that my AWS account has a lot of loggroups of lambdas from old, deleted blueprints clusters. All loggroups have their retention set to Never expire. It would be nice if there's a blueprints option to configure the lambda settings (including retention).

Not sure if that's somehow already possible but i was not able to find an option for that.

Use Case

To prevent AWS accounts to be flooded with old, uneeded CloudWatch loggroups.

Proposed Solution

No response

Other Information

No response

Acknowledgements

  • [ ] I may be able to implement this feature request

CDK version used

2.147.3 (build 32f0fdb)

EKS Blueprints Version

1.16.3

Node.js Version

v20.12.2

Environment details (OS name and version, etc.)

Ubuntu 22.04

muckelba avatar Jan 07 '25 12:01 muckelba

@muckelba I assume you mean log groups for the lambdas that support custom resources for CDK such as the one handling EKS cluster creation or helm chart installation. I looked around in the past and could not find a specific controls in CDK to make it happen. Let me take another look now. Worst case, I will have to create an issue against CDK and reference it there.

shapirov103 avatar Jan 07 '25 20:01 shapirov103

This issue has been automatically marked as stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar Apr 08 '25 00:04 github-actions[bot]

Hey @shapirov103 any updates? :)

muckelba avatar Apr 10 '25 07:04 muckelba

@muckelba here are the options, discussing it in the open to get any feedback from the customers to make a decision:

TLDR: please upvote https://github.com/aws/aws-cdk/issues/33196 (it has some pointers to the solution)

Long version:

At present, the custom resources and lambda functions created by the CDK don't specify an explicit log group. When that happens, by default lambda will use aws/lambda/ log group, it is created automatically and sets to never for retention. Exception is /aws/vendedlogs/states/waiter-state-machine-<name> log group which is actually referenced in the synthesized template. I can adjust the log groups that are produced in the blueprints template with an aspect, but that will only affect the state-machine logs, which I am not sure is the major issue as that one is set to expire by default (731 days).

in order to address the CDK custom resources we can create a log group for all functions, set retention and use an aspect to modify every Function produced by the blueprints to use that one. You will get messages from multiple blueprint specific functions in the same log group. Is that an acceptable solution? Please note this will be a blueprint specific mechanism as I don't observe a clear path to solve it in CDK.

As an interim I can provide a script to run against all aws/lambda resources in an account (or based on provided regex) and set retention to X. While an ugly overall solution, it can address the issue for the existing blueprints. As an option this script can just delete the log groups based on the provided pattern.

Happy to hear your thoughts on this.

shapirov103 avatar Apr 14 '25 20:04 shapirov103

@shapirov103 @muckelba I have been using the solution I mentioned here for a while and its been working perfectly so far

nilroy avatar Apr 27 '25 01:04 nilroy

@nilroy thanks for sharing the solution. I am thinking either we can incorporate it into the blueprints as an opt-in or it will be in the patterns repo as an example.

shapirov103 avatar Apr 27 '25 17:04 shapirov103

@shapirov103 I have further improved the solution. Maybe if you can åoint me where I can actually put it to help everyone else.

nilroy avatar May 29 '25 23:05 nilroy

@shapirov103 I have further improved the solution. Maybe if you can point me where I can actually put it to help everyone else.

@nilroy the code that you shared is in python and this repo is typescript only atm. You mentioned you have improved it since, can you share the repo? If it is available as an npm module that would open it up for consumption directly. Otherwise we will have to port it over.

shapirov103 avatar May 30 '25 13:05 shapirov103

@shapirov103 I have further improved the solution. Maybe if you can point me where I can actually put it to help everyone else.

@nilroy the code that you shared is in python and this repo is typescript only atm. You mentioned you have improved it since, can you share the repo? If it is available as an npm module that would open it up for consumption directly. Otherwise we will have to port it over.

@shapirov103 Sorry for the delayed response and I only have it in python in a private repo. I shall give the relevant pieces as a comment in here. But now with cdk 2.200.0 the cdk provided lambda functions are having some default log groups injected and thats breaking my aproach. Althogh they release 2.200.1 to to disable that behaviour by default. I would. try if we can live without the custom logic to create the log groups and rely on cdk to do that for us. Maybe you can also take a look and share how it went?

nilroy avatar Jun 04 '25 11:06 nilroy

@shapirov103 below is the aspect

from aws_cdk import (
    IAspect,
    RemovalPolicy,
    aws_logs,
    CfnResource,
)
from constructs import IConstruct
from enum import Enum, unique, auto
from typing import Optional
import jsii
import re
import hashlib
import base64


@unique
class CustomResourceLambdaType(Enum):
    AWSCDKOpenIdConnectProvider = auto()
    AWSCDKCfnUtilsProvider = auto()
    VpcRestrictDefaultSG = auto()
    S3AutoDeleteObjects = auto()
    KubectlProvider = auto()


@jsii.implements(IAspect)
class CreateLogGroupForCDKProvisionedLambdas:
    """
    This is a solution for the issue with the CDK where the log group is not created by default for the custom resource lambdas
    injected by CDK. See bug report https://github.com/aws/aws-cdk/issues/33196
    Once the above bug is solved the log group creation can be done using the following code

    aws_cdk.custom_resources.CustomResourceConfig.of(self).add_log_retention_lifetime(logs.RetentionDays.ONE_WEEK)
    aws_cdk.custom_resources.CustomResourceConfig.of(self).add_removal_policy(RemovalPolicy.DESTROY)

    However the tags creation might still be an issue and also the dependecies might not be created properly
    """

    def __init__(
        self,
        lambda_log_group_removal_policy: RemovalPolicy,
        lambda_log_group_retention_days: aws_logs.RetentionDays,
        tags: Optional[dict] = None,
    ) -> None:
        # The key is the cdk metadata path (as synthesized in the cloudformation template) of the IAM role used by the custom resource lambda function
        # The value is the cdk metadata path of the custom resource lambda function

        self.iam_role_lambda_function_path_map = {
            "Custom::AWSCDKOpenIdConnectProviderCustomResourceProvider/Role": "Custom::AWSCDKOpenIdConnectProviderCustomResourceProvider/Handler",
            "AWSCDKCfnUtilsProviderCustomResourceProvider/Role": "AWSCDKCfnUtilsProviderCustomResourceProvider/Handler",
            "Custom::VpcRestrictDefaultSGCustomResourceProvider/Role": "Custom::VpcRestrictDefaultSGCustomResourceProvider/Handler",
            "Custom::S3AutoDeleteObjectsCustomResourceProvider/Role": "Custom::S3AutoDeleteObjectsCustomResourceProvider/Handler",
            "KubectlProvider/Handler/ServiceRole/Resource": "KubectlProvider/Handler/Resource",
            "KubectlProvider/Provider/framework-onEvent/ServiceRole/Resource": "KubectlProvider/Provider/framework-onEvent/Resource",
        }

        self.known_paths_for_custom_resource_lambda_role = []
        self.known_paths_for_custom_resource_lambda = []

        for key, value in self.iam_role_lambda_function_path_map.items():
            self.known_paths_for_custom_resource_lambda_role.append(f"/{key}$")
            self.known_paths_for_custom_resource_lambda.append(f"/{value}$")

        self.known_paths = self.known_paths_for_custom_resource_lambda + self.known_paths_for_custom_resource_lambda_role
        self.tags = []
        self.custom_resource_lambda_log_group_removal_policy = lambda_log_group_removal_policy
        self.custom_resource_lambda_log_group_retention_days = lambda_log_group_retention_days

        if tags:
            for key, value in tags.items():
                self.tags.append(
                    {
                        "Key": key,
                        "Value": value,
                    }
                )
    def generate_consistent_hash(
        self,
        input: str,
        length: int = 12,
    ) -> str:
        hash_bytes = hashlib.sha256(input.encode()).digest()  # Generate SHA-256 hash

        hash_base64 = base64.urlsafe_b64encode(hash_bytes).decode()  # Base64 encoding (URL-safe)

        # Define regex pattern to allow only alphanumeric characters
        pattern = r"[A-Za-z0-9]"

        # Filter using regex
        filtered_hash = "".join(re.findall(pattern, hash_base64))

        return filtered_hash[:length]  # Ensure required length

    def generate_function_name(
        self,
        path: str,
    ) -> str:
        parsed_path = path.split("/")[1].split("::")[-1]
        name_prefix: Optional[str] = None

        cr_lambda_type_fields = [e.name for e in CustomResourceLambdaType]

        for field in cr_lambda_type_fields:
            if field in parsed_path:
                name_prefix = field
                break

        if not name_prefix:
            name_prefix = parsed_path

        return f"{name_prefix}{self.generate_consistent_hash(input=path)}"

    def generate_lambda_function_path(self, path: str) -> str:
        lambda_function_path: str = ""
        for role_path in self.iam_role_lambda_function_path_map.keys():
            if path.endswith(role_path):
                # If the path ends with the role path, return the corresponding lambda function path
                prefix = path.split(role_path)[0]
                lambda_function_path = f"{prefix}{self.iam_role_lambda_function_path_map[role_path]}"
                break

        if not lambda_function_path:
            raise ValueError(f"Path {path} does not match any known role path")

        return lambda_function_path

    def add_tags(
        self,
        node: CfnResource,
        tags: list[dict[str, str]],
    ):
        node.add_property_override(
            "Tags",
            tags,
        )

    def match_known_paths(
        self,
        path: str,
    ) -> bool:
        for known_path in self.known_paths:
            if re.search(known_path, path):
                return True
        return False

    def visit(self, node: IConstruct) -> None:
        """
        The custom resource node tree is in the below order
        1. Role
        2. Lambda function
        So always the node which is a role will be visited first and then the lambda function for the same parent node
        """
        if (self.match_known_paths(path=node.node.path)) and isinstance(node, CfnResource):
            if node.cfn_resource_type == "AWS::IAM::Role":
                corresponding_lambda_function_path = self.generate_lambda_function_path(
                    path=node.node.path,
                )

                function_name = self.generate_function_name(
                    path=corresponding_lambda_function_path,
                )

                log_group_name = f"/aws/lambda/{function_name}"
                log_group = aws_logs.LogGroup(
                    node,
                    id="LogGroup",
                    log_group_name=log_group_name,
                    removal_policy=self.custom_resource_lambda_log_group_removal_policy,
                    retention=self.custom_resource_lambda_log_group_retention_days,
                )

                # This ensures that the log group is created first and deleted last leaving no chance for orphaned log groups
                node.node.add_dependency(log_group)

                # Add tags to the IAM Role

                tags = self.tags.copy()
                tags.append(
                    {
                        "Key": "LogGroupName",
                        "Value": log_group.log_group_name,
                    }
                )

                self.add_tags(
                    node=node,
                    tags=tags,
                )

            if node.cfn_resource_type == "AWS::Lambda::Function":
                function_name = self.generate_function_name(node.node.path)

                # Set the function name property of the Lambda function
                # For all cloudformation properties a lambda function support refer
                # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-lambda-function.html
                node.add_property_override(
                    "FunctionName",
                    function_name,
                )

                self.add_tags(
                    node=node,
                    tags=self.tags,
                )


nilroy avatar Jun 05 '25 21:06 nilroy

This issue has been automatically marked as stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar Sep 04 '25 00:09 github-actions[bot]

Issue closed due to inactivity.

github-actions[bot] avatar Nov 03 '25 00:11 github-actions[bot]