aws-sdk-pandas Segmentation Fault during Lambda Execution

Describe the bug

Hello,

I work in AWS Support and I am raising a Github issue on behalf of a customer.

The customer uses the Lambda Layer for AWS Data Wrangler in their Lambda function to read json files from S3, create panda dataframes using awswrangler and process the file data, create a glue catalog table and store the flattened data files in S3 in parquet format. It was observed that a small number of Lambda invocation executions out of thousands fail with the error Runtime exited with error: signal: segmentation fault Runtime.ExitError

After having done some research, it appears as though this type of error is raised by a binary dependencies ( implemented in C/C++) utilized a Python module. From having looked at many resources on the web, the Pandas module in particular is known for raising this type of error for various different root causes.

I am trying to assist the customer with guidance on how to troubleshoot the root cause of their segmentation faults going forward, particularly on how to gather some more useful debug information. Currently the logs for the failed Lambda invocations end abruptly with the logged line Runtime exited with error: signal: segmentation fault Runtime.ExitError and there is no insight into what is happening with Pandas and the related binary dependencies.

From what I gather from [1][2], it is not entirely straightforward to debug binary dependencies for Pandas for segmentation faults.

Can you please provide guidance on what steps can be taken to output verbose debug logging for the aws data wrangler layer and its binary extensions in the Lambda environment? In particular, it would be great if we could have steps to collect debugging and replication data so that we can come back with the information needed for troubleshooting for an issue on this repo. Any other insights you may have would be appreciated.

References:

[1] https://pandas.pydata.org/docs/development/debugging_extensions.html

[2] https://blog.richard.do/2018/03/18/how-to-debug-segmentation-fault-in-python/

How to Reproduce

Unfortunately, we do not have detailed steps on how to reproduce the issue other than the Lambda execution logs and the names of the files that were being processed.

The issue happened intermittently (only for a few invocations out of thousands). The customer noted that the issues only happened when they were utilizing version 5 of the Lambda Layer and have not happened since they upgraded to subsequent versions.

At this point we are seeking guidance on how best to gather debugging information to troubleshoot the issues in more detail.

Expected behavior

For the error not to happen and for the Lambda runtime not to exit.

Your project

No response

Screenshots

No response

OS

Amazon Linux 2, underlying OS for Python Lambda Runtime

Python version

3.9

AWS SDK for pandas version

arn: arn:aws:lambda:us-east-1:336392948345:layer:AWSDataWrangler-Python39:5

Additional context

No response

Aug 31 '22 00:08 Alexander-Ludwig

Hi @Alexander-Ludwig, in terms of debugging, my advice would be for the customer to first enable logging in their Lambda function code. They could add a line for pandas too:

logging.getLogger("pandas").setLevel(logging.DEBUG)

Hopefully this would give them more visibility into the error.

It's interesting that they haven't encountered this error after upgrading from version 5 of the layer. It could very well be that pandas introduced a fix in their latest version which we have released as part of the most recent awswrangler layers

Sep 07 '22 14:09 jaidisido

Is this closed? I've been running into the same error

Jul 11 '23 16:07 Jxmedia

I get the same error when using python3-saml library.

Feb 16 '24 10:02 tomohiro-suzuki-6

aws-sdk-pandas aws-sdk-pandas copied to clipboard

Segmentation Fault during Lambda Execution

Describe the bug

How to Reproduce

Expected behavior

Your project

Screenshots

OS

Python version

AWS SDK for pandas version

Additional context

aws-sdk-pandas
aws-sdk-pandas copied to clipboard