aws-sdk-pandas
aws-sdk-pandas copied to clipboard
Intermittent NETWORK_CONNECTION Error During s3.read_parquet_table Operation
Describe the bug
Hi,
I'm encountering an intermittent issue when using the s3.read_parquet_table function in my ETL pipeline. The pipeline reads Parquet files from S3 every 5 minutes (modin, ray, awswrangler). Occasionally, I receive the following error:
AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 28, Timeout was reached
How to Reproduce
I am unable to reproduce this error consistently, and it seems to resolve itself after some time. import awswrangler as wr
df = wr.s3.read_parquet_table(table,database,partition_filter, filename_suffix)
Expected behavior
No response
Your project
No response
Screenshots
No response
OS
Linux
Python version
3.10.13
AWS SDK for pandas version
3.7.2
Additional context
No response
There is a long standing issue opened in https://github.com/ray-project/ray/issues/43803 on the subject
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.