aws-sdk-pandas icon indicating copy to clipboard operation
aws-sdk-pandas copied to clipboard

AWSSDKPandas lambda layer does not include pyarrow dependencies to support `encryption_configuration` parameter

Open everlastingcrown opened this issue 6 months ago • 3 comments

Describe the bug

Methods such as awswrangler.s3.to_parquet() contain an encryption_configuration parameter (for encrypting the data), which relies on dependencies from pyarrow.parquet.encryption.

However, these dependencies do not exist in the "AWSSDKPandas-Python311" AWS Lambda Layer.

When running import pyarrow.parquet.encryption in an AWS Lambda on Python 3.11, the following error occurs:

Runtime.ImportModuleError: Unable to import module 'index': No module named 'pyarrow._parquet_encryption'

How to Reproduce

Deploy an AWS Lambda with the following configuration:

  • Runtime: Python 3.11
  • Layers: [AWSSDKPandas-Python311:17]

The lambda code is:

import pyarrow.parquet.encryption

Expected behavior

It is expected that the pyarrow.parquet.encryption import succeeds, so that it can then be used to define the encryption_configuration parameter.

Your project

No response

Screenshots

No response

OS

Linux

Python version

3.11

AWS SDK for pandas version

17

Additional context

The encryption_configuration feature is a relatively new addition (Feb 24): https://github.com/aws/aws-sdk-pandas/issues/2642. I suspect that part of pyarrow is not included in the layer build to reduce the total size.

everlastingcrown avatar May 16 '25 03:05 everlastingcrown

You are right that we have to manage which features are enabled based on the tradeoff between their popularity vs their impact on the Lambda layer size https://github.com/aws/aws-sdk-pandas/issues/3084#issuecomment-2618574065 We can consider enabling the flag for a future release but again it would depend on the impact on the layer size limit

jaidisido avatar May 16 '25 13:05 jaidisido

You are right that we have to manage which features are enabled based on the tradeoff between their popularity vs their impact on the Lambda layer size #3084 (comment) We can consider enabling the flag for a future release but again it would depend on the impact on the layer size limit

That's fair enough.

Are there any alternate solutions? I'm considering using a container based Lambda to overcome the size limitation. However it is far less convenient than using a standard Lambda with the layer.

everlastingcrown avatar May 21 '25 03:05 everlastingcrown

The scripts to build the layers are available here in case it's helpful

jaidisido avatar May 21 '25 12:05 jaidisido

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.

github-actions[bot] avatar Jul 20 '25 15:07 github-actions[bot]