delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Not working in AWS Lambda (0.16.2 - 0.17.4) OSError: Generic S3 error

Open TheKnightCoder opened this issue 1 year ago • 0 comments

Environment

Delta-rs version: 0.16.2 - 0.17.4

Binding: Python

Environment:

  • Cloud provider: AWS Lambda
  • OS: Python3.12
  • Other: Python3.12 + AWS SDK for Pandas Layer + Custom layer with Deltalake 0.17.4 and pyarrow_hotfix 0.6

Bug

What happened: Code is working in version 0.15.3 code snippet:

table_path = f's3://{BUCKET_NAME}/bronze/test'

def insert():
    print('writting')
    data = {"id": [1,2], "b": [2, 2]}
    df = pd.DataFrame(data)
    print(df)
    write_deltalake(table_path, df, mode='overwrite', overwrite_schema=True)
    print('written')

insert()

What you expected to happen: Attempting to write_deltalake or any other operation will throw this error:

[ERROR] OSError: Generic S3 error: Error after 10 retries in 2.612435942s, max_retries:10, retry_timeout:180s, source:error sending request for url (https://s3.us-east-1.amazonaws.com/bucketname-bmdcgdma/bronze/test/_delta_log/_last_checkpoint): error trying to connect: invalid peer certificate: BadSignature
Traceback (most recent call last):
  File "/var/task/events/s3-update-nfts.py", line 127, in handler
    insert()
  File "/var/task/events/s3-update-nfts.py", line 35, in insert
    write_deltalake(table_path, df, mode='overwrite', overwrite_schema=True, storage_options=storage_options)
  File "/opt/python/deltalake/writer.py", line 265, in write_deltalake
    table, table_uri = try_get_table_and_table_uri(table_or_uri, storage_options)
  File "/opt/python/deltalake/writer.py", line 688, in try_get_table_and_table_uri
    table = try_get_deltatable(table_or_uri, storage_options)
  File "/opt/python/deltalake/writer.py", line 701, in try_get_deltatable
    return DeltaTable(table_uri, storage_options=storage_options)
  File "/opt/python/deltalake/table.py", line 405, in __init__
    self._table = RawDeltaTable(

How to reproduce it:

  • Use the AWS SDK for pandas managed layer for dependencies (pyarrow, numpy ect): https://aws-sdk-pandas.readthedocs.io/en/stable/layers.html

  • Create a custom deltalake lambda layer: requirements.txt

deltalake==0.17.4
pyarrow_hotfix==0.6

build.sh

mkdir -p ./dist/python
pip install -r requirements.txt -t ./dist/python --no-deps --platform manylinux2014_aarch64 

following this article https://delta.io/blog/2023-04-06-deltalake-aws-lambda-wrangler-pandas/ added pyarrow_hotfix as its a required dependency not available in aws sdk for pandas layer

  • Add all s3 permissions to the aws lambda execution layer
  • set env var AWS_S3_ALLOW_UNSAFE_RENAME: 'true'
  • Use the layer to write delta table in aws lambda throw errors

More details: Last working version 0.15.3 but not working from 0.16.2 - 0.17.4

Edit: 0.16.1 is working too, bug introduced int 0.16.2

TheKnightCoder avatar May 14 '24 03:05 TheKnightCoder