amazon-redshift-python-driver
amazon-redshift-python-driver copied to clipboard
Incorrect buffer size calculation leads to failed query execution
Driver version
2.0.908
Redshift version
PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.40677
Client Operating System
ubuntu:20.04
Python version
3.9.12
Table schema
NA
Problem description
Error is raised every time a query lasts more than 5 minutes. Works fine on simpler/faster queries.
- Expected behaviour: No errors when executing long lasting queries.
- Actual behaviour: Errors out because of a buffer size miscalculation.
- Error message/stack trace: unpack_from requires a buffer of at least 5 bytes for unpacking 5 bytes at offset 0 (actual buffer size is 0)
- Any other details that can be helpful: Same code runs fine when executing on Windows. So far, only fails when running on Ubuntu within a Docker container. I have also tried the workaround posted here, without any luck.
Python Driver trace logs
Traceback (most recent call last):
File "/app/./test_long_query.py", line 43, in
Reproduction code
(Query is redacted for confidentiality)
import pandas as pd import redshift_connector
credentials = CREDENTIALS_DICTIONARY query= LARGE_QUERY # Query that takes over 5 minutes to execute
with redshift_connector.connect( host=credentials['host'], database=credentials['dbname'], user=credentials['user'], password=credentials['password'], port=credentials['port'] ) as conn: with conn.cursor() as cursor: cursor.execute(query) result: pd.DataFrame = cursor.fetch_dataframe() print(result.head())
Thanks for reporting issue. We will look into it and get back to you.
I'm unable to reproduce this issue using redshift_connector==2.0.908 and Redshift server version 1.0.40325. I recommend you look into the Docker container configuration to ensure tcp keepalive is sufficiently large enough that there is not a timeout.
with redshift_connector.connect(
region='xxx',
user='xxx',
host='xxx',
password='xxx',
database='xxx',
) as conn:
with conn.cursor() as cursor:
print('creating udf...')
cursor.execute("""CREATE OR REPLACE FUNCTION janky_sleep (x float) RETURNS bool IMMUTABLE as $$
from time import sleep
sleep(x)
return True
$$ LANGUAGE plpythonu;""")
print('created udf')
cursor.execute("select '6 min sleep', janky_sleep(360.0);") # sleep for 6 minutes
print(cursor.fetchall())
creating udf...
created udf
(['6 min sleep', True],)
redshift_connector version: 2.0.908
Process finished with exit code 0
We are looking into the Docker OS with no luck yet. Will report back in the following days as soon as we have news.
Thanks a lot Brooke, as you pointed it seems to be a problem with docker. We have not been able to fix it but we don't think is related to the library as we are having similar issues with long queries while using other libraries as well.
If we find a way to fix it I'll post it.
thanks for the update and persistence in investigating this, Javier. I'll keep this issue open for now incase anyone else is hitting something similar.
I'm hitting a similar issue, I'm not using Docker and my queries are not long running. Initially connection works fine and queries go through, then in about 10 minutes any query results in:
File "/Users/leo/Library/Caches/pypoetry/virtualenvs/di-api-LPP91Z5g-py3.10/lib/python3.10/site-packages/redshift_connector/core.py", line 1965, in handle_messages
code, data_len = ci_unpack(self._read(5))
│ │ │ └ <built-in method read of _io.BufferedRWPair object at 0x1376cbac0>
│ │ └ <redshift_connector.core.Connection object at 0x13579f2e0>
│ └ <built-in method unpack_from of _struct.Struct object at 0x107e27070>
└ None
struct.error: unpack_from requires a buffer of at least 5 bytes for unpacking 5 bytes at offset 0 (actual buffer size is 0)
All subsequent queries fail with a different error:
File "/Users/leo/Library/Caches/pypoetry/virtualenvs/di-api-LPP91Z5g-py3.10/lib/python3.10/site-packages/redshift_connector/core.py", line 1777, in execute
self._flush()
│ └ <built-in method flush of _io.BufferedRWPair object at 0x1376cbac0>
└ <redshift_connector.core.Connection object at 0x13579f2e0>
File "/Users/leo/.pyenv/versions/3.10.2/lib/python3.10/socket.py", line 723, in write
return self._sock.send(b)
│ │ │ └ <memory at 0x2c531d480>
│ │ └ <function SSLSocket.send at 0x105bdcee0>
│ └ <ssl.SSLSocket fd=19, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 53746)>
└ <socket.SocketIO object at 0x13579f280>
File "/Users/leo/.pyenv/versions/3.10.2/lib/python3.10/ssl.py", line 1205, in send
return self._sslobj.write(data)
│ │ │ └ <memory at 0x2c531d480>
│ │ └ <method 'write' of '_ssl._SSLSocket' objects>
│ └ <_ssl._SSLSocket object at 0x1376b4b30>
└ <ssl.SSLSocket fd=19, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 53746)>
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2384)
Driver 2.0.909, Python 3.10, macOS, Redshift 1.0.41881
The same also observed from inside the Docker environment.
I am having the same issue here using:
"redshift_connector==2.0.908", "python 3.8.1", "PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.41881"
Here is my stack trace for this error in case it helps finding the cause:
1666876698965,Traceback (most recent call last): 1666876698965," File ""/usr/local/lib/python3.8/site-packages/redshift_connector/core.py"", line 1631, in execute" 1666876698965," ps = cache[""ps""][key]" 1666876698965,"KeyError: ("" QUERY_STRING_HERE "", ())" 1666876698965,"During handling of the above exception, another exception occurred:" 1666876698965,Traceback (most recent call last): 1666876698965," File ""/core/core/importers/tenjin/tenjin_datavault.py"", line 59, in run" 1666876698965, cursor.execute(query) 1666876698965," File ""/usr/local/lib/python3.8/site-packages/redshift_connector/cursor.py"", line 240, in execute" 1666876698965," self._c.execute(self, operation, args)" 1666876698965," File ""/usr/local/lib/python3.8/site-packages/redshift_connector/core.py"", line 1701, in execute" 1666876698965, self.handle_messages(cursor) 1666876698965," File ""/usr/local/lib/python3.8/site-packages/redshift_connector/core.py"", line 1965, in handle_messages" 1666876698965," code, data_len = ci_unpack(self._read(5))" 1666876698965,struct.error: unpack_from requires a buffer of at least 5 bytes for unpacking 5 bytes at offset 0 (actual buffer size is 0)