aws-sdk-pandas
aws-sdk-pandas copied to clipboard
AWSSDKPandas lambda layer does not include pyarrow dependencies to support `encryption_configuration` parameter
Describe the bug
Methods such as awswrangler.s3.to_parquet() contain an encryption_configuration parameter (for encrypting the data), which relies on dependencies from pyarrow.parquet.encryption.
However, these dependencies do not exist in the "AWSSDKPandas-Python311" AWS Lambda Layer.
When running import pyarrow.parquet.encryption in an AWS Lambda on Python 3.11, the following error occurs:
Runtime.ImportModuleError: Unable to import module 'index': No module named 'pyarrow._parquet_encryption'
How to Reproduce
Deploy an AWS Lambda with the following configuration:
- Runtime: Python 3.11
- Layers: [AWSSDKPandas-Python311:17]
The lambda code is:
import pyarrow.parquet.encryption
Expected behavior
It is expected that the pyarrow.parquet.encryption import succeeds, so that it can then be used to define the encryption_configuration parameter.
Your project
No response
Screenshots
No response
OS
Linux
Python version
3.11
AWS SDK for pandas version
17
Additional context
The encryption_configuration feature is a relatively new addition (Feb 24): https://github.com/aws/aws-sdk-pandas/issues/2642. I suspect that part of pyarrow is not included in the layer build to reduce the total size.
You are right that we have to manage which features are enabled based on the tradeoff between their popularity vs their impact on the Lambda layer size https://github.com/aws/aws-sdk-pandas/issues/3084#issuecomment-2618574065 We can consider enabling the flag for a future release but again it would depend on the impact on the layer size limit
You are right that we have to manage which features are enabled based on the tradeoff between their popularity vs their impact on the Lambda layer size #3084 (comment) We can consider enabling the flag for a future release but again it would depend on the impact on the layer size limit
That's fair enough.
Are there any alternate solutions? I'm considering using a container based Lambda to overcome the size limitation. However it is far less convenient than using a standard Lambda with the layer.
The scripts to build the layers are available here in case it's helpful
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.