aws-sdk-pandas icon indicating copy to clipboard operation
aws-sdk-pandas copied to clipboard

Intermittent NETWORK_CONNECTION Error During s3.read_parquet_table Operation

Open DimitarSirakov opened this issue 1 year ago • 1 comments

Describe the bug

Hi,

I'm encountering an intermittent issue when using the s3.read_parquet_table function in my ETL pipeline. The pipeline reads Parquet files from S3 every 5 minutes (modin, ray, awswrangler). Occasionally, I receive the following error:

AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 28, Timeout was reached

How to Reproduce

I am unable to reproduce this error consistently, and it seems to resolve itself after some time. import awswrangler as wr

df = wr.s3.read_parquet_table(table,database,partition_filter, filename_suffix)

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

Linux

Python version

3.10.13

AWS SDK for pandas version

3.7.2

Additional context

No response

DimitarSirakov avatar Jun 06 '24 12:06 DimitarSirakov

There is a long standing issue opened in https://github.com/ray-project/ray/issues/43803 on the subject

jaidisido avatar Jun 06 '24 14:06 jaidisido

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.

github-actions[bot] avatar Aug 05 '24 15:08 github-actions[bot]