automated-security-response-on-aws icon indicating copy to clipboard operation
automated-security-response-on-aws copied to clipboard

SO0111-SHARR-sendNotifications lambda (via the layer) returns ResourceAlreadyExistsException when a log stream already exists

Open kpapagno opened this issue 10 months ago • 4 comments
trafficstars

Describe the bug Log streams written with the help of the _create_log_stream in applogger.py in the orchestrator lambda layer. If the log stream name already exists then the lambda fails.

To Reproduce Have the orchestrator stepfunction run more than one time per day

Expected behavior

The Log Group SO0111-SHARR Log stream names are now custom to reflect the date. It appears that if the notify lambda is run multiple times in the same day, the log stream (SHARR-2025-01-09) will already exist. The expectation is that when a log stream already exists, the orchestrator lambdas will not raise an exception. Instead it should get the next sequence token from the log stream which will be used in subsequent put_log_events calls.

Please complete the following information about the solution:

  • [x] Version: v2.2.0
  • [x] Region: us-east-1
  • [x] Was the solution modified from the version published on this repository? Yes
  • [x] If the answer to the previous question was yes, are the changes available on GitHub? Yes
  • [x] Have you checked your service quotas for the sevices this solution uses? Yes, N/A
  • [x] Were there any errors in the CloudWatch Logs? Troubleshooting FIRST RUN. (Logs of /aws/lambda/SO0111-SHARR-sendNotifications):
2025-01-09T03:46:57.384Z | INIT_START Runtime Version: python:3.11.v46 Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:84bc5fe9641102b252ebbfd80d16f2791f7bcca59e26ce68d95066317adf4503
-- | --
  | 2025-01-09T03:46:57.797Z | INFO successfully patched module sqlite3
  | 2025-01-09T03:46:57.850Z | INFO successfully patched module requests
  | 2025-01-09T03:46:57.855Z | INFO successfully patched module botocore
  | 2025-01-09T03:46:57.860Z | START RequestId: aae012ed-edc1-49db-8299-9a7e16dfd34b Version: $LATEST
  | 2025-01-09T03:46:57.920Z | INFO Found credentials in environment variables.
  | 2025-01-09T03:46:59.453Z | Sending metrics data <snipped for bug report....>
  | 2025-01-09T03:47:00.975Z | Notification message ID 7ae791c3-f712-5231-918a-ce3abfe6a8c0 sent.
  | 2025-01-09T03:47:00.975Z | Creating log stream SHARR-2025-01-09
  | 2025-01-09T03:47:01.328Z | END RequestId: aae012ed-edc1-49db-8299-9a7e16dfd34b

SECOND RUN (Logs of /aws/lambda/SO0111-SHARR-sendNotifications):

2025-01-09T03:46:57.384Z | INIT_START Runtime Version: python:3.11.v46 Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:84bc5fe9641102b252ebbfd80d16f2791f7bcca59e26ce68d95066317adf4503
-- | --
  | 2025-01-09T03:46:57.810Z | INFO successfully patched module sqlite3
  | 2025-01-09T03:46:57.864Z | INFO successfully patched module requests
  | 2025-01-09T03:46:57.870Z | INFO successfully patched module botocore
  | 2025-01-09T03:46:57.875Z | START RequestId: e241b664-f7e9-42e5-aa77-526a9742de15 Version: $LATEST
  | 2025-01-09T03:46:57.912Z | INFO Found credentials in environment variables.
  | 2025-01-09T03:46:59.503Z | Sending metrics data  <snipped for bug report....>
  | 2025-01-09T03:47:00.997Z | Notification message ID 07238755-5443-5ad3-a6ef-d8ac1cb734c9 sent.
  | 2025-01-09T03:47:00.997Z | Creating log stream SHARR-2025-01-09
  | 2025-01-09T03:47:01.409Z | LAMBDA_WARNING: Unhandled exception. The most likely cause is an issue in the function code. However, in rare cases, a Lambda runtime update can cause unexpected function behavior. For functions using managed runtimes, runtime updates can be triggered by a function change, or can be applied automatically. To determine if the runtime has been updated, check the runtime version in the INIT_START log entry. If this error correlates with a change in the runtime version, you may be able to mitigate this error by temporarily rolling back to the previous runtime version. For more information, see https://docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html
  | 2025-01-09T03:47:01.410Z | [ERROR] ResourceAlreadyExistsException: An error occurred (ResourceAlreadyExistsException) when calling the CreateLogStream operation: The specified log stream already exists Traceback (most recent call last):   File "/opt/python/lib/python3.11/site-packages/aws_lambda_powertools/tracing/tracer.py", line 324, in decorate     response = lambda_handler(event, context, **kwargs)   File "/var/task/send_notifications.py", line 171, in lambda_handler     build_and_send_notification(   File "/var/task/send_notifications.py", line 223, in build_and_send_notification     notification.notify()   File "/opt/python/layer/sechub_findings.py", line 323, in notify     self.applogger.flush()   File "/opt/python/layer/applogger.py", line 109, in flush     log_stream = self._create_log_stream(log_stream=self.stream_name)   File "/opt/python/layer/applogger.py", line 73, in _create_log_stream     get_logs_connection(self.apiclient).create_log_stream(   File "/opt/python/lib/python3.11/site-packages/botocore/client.py", line 569, in _api_call     return self._make_api_call(operation_name, kwargs)   File "/opt/python/lib/python3.11/site-packages/wrapt/wrappers.py", line 669, in __call__     return self._self_wrapper(self.__wrapped__, self._self_instance,   File "/opt/python/lib/python3.11/site-packages/aws_xray_sdk/ext/botocore/patch.py", line 38, in _xray_traced_botocore     return xray_recorder.record_subsegment(   File "/opt/python/lib/python3.11/site-packages/aws_xray_sdk/core/recorder.py", line 456, in record_subsegment     return_value = wrapped(*args, **kwargs)   File "/opt/python/lib/python3.11/site-packages/botocore/client.py", line 1023, in _make_api_call     raise error_class(parsed_response, operation_name)
  | 2025-01-09T03:47:01.412Z | END RequestId: e241b664-f7e9-42e5-aa77-526a9742de15
  | 2025-01-09T03:47:01.412Z | REPORT RequestId: e241b664-f7e9-42e5-aa77-526a9742de15 Duration: 3537.20 ms Billed Duration: 3538 ms Memory Size: 256 MB Max Memory Used: 103 MB Init Duration: 489.41 ms XRAY TraceId: 1-677f46a9-d153f7c536933a65a3a4aba9 SegmentId: 6245c5fe72c92874 Sampled: true

Screenshots If applicable, add screenshots to help explain your problem (please DO NOT include sensitive information).

Additional context I made my own change to get "by" this issue, but not confident enough for submitting a pull request at this time.

Updated code in layer applogger.py

   def _create_log_stream(self, log_stream):
        """Create a new log stream"""
        # append today's date to stream name
        log_stream = log_stream + "-" + str(date.today())
        try:
            print(("Creating log stream {}".format(log_stream)))
            get_logs_connection(self.apiclient).create_log_stream(
                logGroupName=self.log_group, logStreamName=log_stream
            )
            self._stream_token = "0"
        except Exception as e:
            # if the stream was created in between the call ignore the error
            if type(e).__name__ == "ResourceAlreadyExistsException":
                print("Log Stream already exists")
+                # Get the sequence token for the existing stream
+               response = get_logs_connection(self.apiclient).describe_log_streams(
+                    logGroupName=self.log_group, logStreamNamePrefix=log_stream, limit=1
+                )
+                if response["logStreams"]:
+                    self._stream_token = response["logStreams"][0].get(
+                        "uploadSequenceToken", "0"
+                    )
            elif type(e).__name__ == "ResourceNotFoundException":
                if self._create_log_group():
                    get_logs_connection(self.apiclient).create_log_stream(
                        logGroupName=self.log_group, logStreamName=log_stream
                    )
                else:
                    raise FailedToCreateLogGroup
            else:
                raise e
        return log_stream

When catching ResourceAlreadyExistsException, we now fetch the current sequence token for the existing stream using describe_log_streams. This sequence token is stored in self._stream_token which will be used in subsequent put_log_events calls. This modification ensures that when you encounter an existing stream, you'll get the correct sequence token right away, which should prevent the error and allow for proper log writing to existing streams. The sequence token is important because CloudWatch Logs uses it to maintain the order of log events and prevent duplicate entries. When a stream already exists, we need to know the current sequence token to append new logs correctly.

Additionally, I added the permission logs:DescribeLogStreams to all the policies which work with writing logs as the layer may use the common code.

kpapagno avatar Jan 09 '25 16:01 kpapagno