aws-sdk-go-v2 icon indicating copy to clipboard operation
aws-sdk-go-v2 copied to clipboard

High Memory Usage Leading to OOM Killed State

Open huteshmahajan opened this issue 1 year ago • 4 comments

Acknowledgements

  • [X] I have searched (https://github.com/aws/aws-sdk/issues?q=is%3Aissue) for past instances of this issue
  • [X] I have verified all of my SDK modules are up-to-date (you can perform a bulk update with go get -u github.com/aws/aws-sdk-go-v2/...)

Describe the bug

We are encountering an Out Of Memory (OOM) issue in our application when attempting to retrieve data using AWS Lambda. Upon investigation, we've identified that the aws/aws-sdk-go-v2 library is consuming a significant amount of memory, leading to the pod being terminated in an OOM killed state.

Observations

  • Memory profiling using Pyroscope indicates that certain components of the aws/aws-sdk-go-v2 library are consuming an unusually high amount of memory.
  • The issue seems persistent across multiple runs, suggesting a potential memory leak or inefficient memory usage within the library.

Screenshots Attach the relevant screenshots from Pyroscope showing the memory consumption for the inuse-space memory.

image image

Expected Behavior

The aws/aws-sdk-go-v2 library should manage memory efficiently, preventing the application from reaching an OOM state.

Current Behavior

The application consistently enters an OOM killed state when using the aws/aws-sdk-go-v2 library, indicating potential issues with memory management within the library.

Reproduction Steps

Deploy an application which uses aws/aws-sdk-go-v2 wih high volume of data in s3 Monitor memory usage using a profiling tool (e.g., Pyroscope). Observe that memory consumption spikes significantly during data retrieval, eventually leading to the pod being killed due to OOM.

Possible Solution

No response

Additional Information/Context

No response

AWS Go SDK V2 Module Versions Used

github.com/aws/aws-sdk-go-v2 v1.24.1 github.com/aws/aws-sdk-go-v2/config v1.26.6 github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.15.15 github.com/aws/aws-sdk-go-v2/service/lambda v1.49.7 github.com/aws/aws-sdk-go-v2/service/s3 v1.48.1 github.com/aws/smithy-go v1.19.0

Compiler and Version used

go version go1.22.3 darwin/arm64

Operating System and version

MacOs 14.1.1 (23B81)

huteshmahajan avatar Aug 22 '24 05:08 huteshmahajan

This issue is related to #2706

huteshmahajan avatar Aug 22 '24 11:08 huteshmahajan

Hi @huteshmahajan ,

Deploy an application which uses aws/aws-sdk-go-v2 wih high volume of data in s3

This is too broad of reproduction conditions for us to take action on. Can you please provide the code you used to test this?

Thanks, Ran~

RanVaknin avatar Aug 22 '24 17:08 RanVaknin

Hello @RanVaknin This is our production code. So cannot provide actual code snippet. But, I'd like to provide some insights into the issue we're facing. We invoke multiple AWS Lambda functions concurrently from our appplication. In some cases, lambda response exceeds the AWS Lambda Service's standard limit of 6MB and that after retries, leads to OOM.

huteshmahajan avatar Aug 27 '24 06:08 huteshmahajan

Hi @huteshmahajan ,

I'm are not after your business logic. All I'm after is a reproducible code snippet that can reliably raise the error you are seeing. In the issues you linked the issue was described with S3, and was unrelated to Lambda.

We invoke multiple AWS Lambda functions concurrently from our appplication. In some cases, lambda response exceeds the AWS Lambda Service's standard limit of 6MB and that after retries, leads to OOM.

This is still too broad of a description. Is your current application running on lambda itself? What do you mean by: "lambda response exceeds the AWS Lambda Service's standard limit of 6MB"? The response from which lambda?

What we are after is a code snippet / example repository that can reliably demonstrate this issue so we may reproduce it ourselves and investigate the root cause.

Thanks, Ran~

RanVaknin avatar Aug 27 '24 17:08 RanVaknin

This issue has not received a response in 1 week. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.

github-actions[bot] avatar Sep 07 '24 00:09 github-actions[bot]

Did you ever find a solution to this?

shayneoneill avatar Jun 12 '25 04:06 shayneoneill

@shayneoneill
I recently discovered a similar memory issue in rclone. The reason is that rclone caches some metadata strings returned by the SDK. Although this type of metadata is expected to consume only a small amount of memory, it causes the memory allocated during XML parsing in the underlying SDK to not be released, resulting in significantly higher memory consumption than expected. After modifying it to clone such strings, the memory consumption returned to its original level.

https://github.com/rclone/rclone/pull/8684

VVoidV avatar Jul 16 '25 08:07 VVoidV