ibis bug: Unable to connect to athena

trafficstars

What happened?

I followed the instructions at https://ibis-project.org/backends/athena to connect to and use Amazon Athena backend as follows:

import os
import ibis
from dotenv import load_dotenv

load_dotenv() # load env variables from .env

# the staging dir env variable below is of the format s3://MY_BUCKET
con = ibis.athena.connect(s3_staging_dir=os.environ['AWS_ATHENA_S3_STAGING_DIR'])

The call to ibis.athena.connect fails with the following error:

Failed to read MY_BUCKET/3d1008dc-55de-490e-b9f9-cd581b6f327f.csv.
Traceback (most recent call last):
  File "/tmp/ibis-test/virtualenv/lib/python3.10/site-packages/pyathena/arrow/result_set.py", line 240, in _read_csv
    self._fs.open_input_stream(f"{bucket}/{key}"),
  File "pyarrow/_fs.pyx", line 829, in pyarrow._fs.FileSystem.open_input_stream
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
OSError: When reading information for key '3d1008dc-55de-490e-b9f9-cd581b6f327f.csv' in bucket 'MY_BUCKET': AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 28, Timeout was reached
Traceback (most recent call last):
  File "/tmp/ibis-test/virtualenv/lib/python3.10/site-packages/pyathena/arrow/result_set.py", line 240, in _read_csv
    self._fs.open_input_stream(f"{bucket}/{key}"),
  File "pyarrow/_fs.pyx", line 829, in pyarrow._fs.FileSystem.open_input_stream
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
OSError: When reading information for key '3d1008dc-55de-490e-b9f9-cd581b6f327f.csv' in bucket 'MY_BUCKET': AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 28, Timeout was reached

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/ibis-test/ibis_try.py", line 6, in <module>
    con = ibis.athena.connect(s3_staging_dir=os.environ['AWS_ATHENA_S3_STAGING_DIR'])
  File "/tmp/ibis-test/virtualenv/lib/python3.10/site-packages/ibis/__init__.py", line 110, in connect
    return backend.connect(*args, **kwargs)
  File "/tmp/ibis-test/virtualenv/lib/python3.10/site-packages/ibis/backends/__init__.py", line 928, in connect
    new_backend.reconnect()
  File "/tmp/ibis-test/virtualenv/lib/python3.10/site-packages/ibis/backends/__init__.py", line 943, in reconnect
    self.do_connect(*self._con_args, **self._con_kwargs)
  File "/tmp/ibis-test/virtualenv/lib/python3.10/site-packages/ibis/backends/athena/__init__.py", line 366, in do_connect
    self._memtable_catalog = self.current_catalog
  File "/tmp/ibis-test/virtualenv/lib/python3.10/site-packages/ibis/backends/athena/__init__.py", line 46, in current_catalog
    with self._safe_raw_sql(
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/tmp/ibis-test/virtualenv/lib/python3.10/site-packages/ibis/backends/athena/__init__.py", line 291, in _safe_raw_sql
    yield cur.execute(query, *args, **kwargs)
  File "/tmp/ibis-test/virtualenv/lib/python3.10/site-packages/pyathena/arrow/cursor.py", line 136, in execute
    self.result_set = AthenaArrowResultSet(
  File "/tmp/ibis-test/virtualenv/lib/python3.10/site-packages/pyathena/arrow/result_set.py", line 79, in __init__
    self._table = self._as_arrow()
  File "/tmp/ibis-test/virtualenv/lib/python3.10/site-packages/pyathena/arrow/result_set.py", line 276, in _as_arrow
    table = self._read_csv()
  File "/tmp/ibis-test/virtualenv/lib/python3.10/site-packages/pyathena/arrow/result_set.py", line 251, in _read_csv
    raise OperationalError(*e.args) from e
pyathena.error.OperationalError: When reading information for key '3d1008dc-55de-490e-b9f9-cd581b6f327f.csv' in bucket 'MY_BUCKET': AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 28, Timeout was reached

I expected it to not throw error and that I would be able to call con.list_tables() next.

Additional details:

I am able to ls and cp the CSV file referred to in the above stacktrace just fine using aws cli:

aws --profile athena_profile s3 ls s3://MY_BUCKET/3d1008dc-55de-490e-b9f9-cd581b6f327f.csv.

The file contains:

$ cat 3d1008dc-55de-490e-b9f9-cd581b6f327f.csv
"_col0"
"awsdatacatalog"

I am able to use pyathena perfectly fine, which I believe is what ibis uses behind the scenes:

import os
from pyathena import connect
from dotenv import load_dotenv

load_dotenv()  # load env variables from .env

# All pyarrow config is done using env variables, so nothing is passed to connect()
cursor = connect().cursor()

cursor.execute("SELECT * FROM NAMESPACE.TABLE LIMIT 1")
print(cursor.description)
print(cursor.fetchall())

I am behind a corporate proxy. The proxy variables are defined via environment variables (http_proxy, https_proxy and their uppercase forms).

I am quite eager to be able to use ibis. Please let me know if I can provide any other details to help troubleshoot.

What version of ibis are you using?

ibis-framework 10.5.0

What backend(s) are you using, if any?

PyAthena 3.13.0

Relevant log output

Code of Conduct

[x] I agree to follow this project's Code of Conduct

May 07 '25 08:05 yeban

Can you try reading a file using vanilla pyarrow?

May 07 '25 11:05 cpcloud

I tried. It looks like pyarrow doesn't read proxy settings from environment variables.

For example, the following doesn't work - it throws error like in the original post:

import pyarrow as pa
from pyarrow import fs

s3 = fs.SubTreeFileSystem("MY_BUCKET", fs.S3FileSystem(region='MY_REGION'))
fl = s3.open_input_file('3d1008dc-55de-490e-b9f9-cd581b6f327f.csv')
print(fl.readall())

Error:

Traceback (most recent call last):
  File "/tmp/ibis-test/pyarrow_test.py", line 6, in <module>
    fl = s3.open_input_file('3d1008dc-55de-490e-b9f9-cd581b6f327f.csv')
  File "pyarrow/_fs.pyx", line 787, in pyarrow._fs.FileSystem.open_input_file
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
OSError: When reading information for key '3d1008dc-55de-490e-b9f9-cd581b6f327f.csv' in bucket 'MY_BUCKET': AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 28, Timeout was reached

But this works:

import os
import pyarrow as pa
from pyarrow import fs

s3 = fs.SubTreeFileSystem("MY_BUCKET", fs.S3FileSystem(region='MY_REGION', proxy_options=os.environ['http_proxy']))
fl = s3.open_input_file('3d1008dc-55de-490e-b9f9-cd581b6f327f.csv')
print(fl.readall())

May 07 '25 15:05 yeban

ibis ibis copied to clipboard

bug: Unable to connect to athena

What happened?

What version of ibis are you using?

What backend(s) are you using, if any?

Relevant log output

Code of Conduct

ibis
ibis copied to clipboard