clickhouse-connect
clickhouse-connect copied to clipboard
Streaming with query_row_block_stream crashes after few reads
Describe the bug
The streaming read is crashing after some time if there any processing in between reads.
Traceback (most recent call last):
File "/usr/lib/python3.11/http/client.py", line 573, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/http/client.py", line 540, in _read_next_chunk_size
return int(line, 16)
^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 16: b'\xe9\xe9\x80\xe8\xf8\x8c\x05I"\xe7\x1bh\x9e\x0c\xa6\x06\xe3\x92\xa3?\x7f6:e\xec\x9d\xa8\x01d\xd3\x118{\xe1\xe9\xe60ta\x11T\xeb\xd0\x1d\x91\xed\xbf\x8aT5\xc8\x00\xceN\x1bA{\xe4\xf9\xeb\xd0\xc5\xba\xb
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.11/http/client.py", line 590, in _read_chunked
chunk_left = self._get_chunk_left()
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/http/client.py", line 575, in _get_chunk_left
raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/progai/.venv/lib/python3.11/site-packages/urllib3/response.py", line 444, in _error_catcher
yield
File "/progai/.venv/lib/python3.11/site-packages/urllib3/response.py", line 567, in read
data = self._fp_read(amt) if not fp_closed else b""
^^^^^^^^^^^^^^^^^^
File "/progai/.venv/lib/python3.11/site-packages/urllib3/response.py", line 533, in _fp_read
return self._fp.read(amt) if amt is not None else self._fp.read()
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/http/client.py", line 467, in read
return self._read_chunked(amt)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/http/client.py", line 605, in _read_chunked
raise IncompleteRead(b''.join(value)) from exc
http.client.IncompleteRead: IncompleteRead(134 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/progai/pipeline/airflow/dags/shared/female_names_processing.py", line 76, in execute
for i, block in enumerate(stream):
File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/driver/common.py", line 201, in __next__
return next(self.gen)
^^^^^^^^^^^^^^
File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/driver/query.py", line 296, in _row_block_stream
for block in self._column_block_stream():
File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/driver/transform.py", line 75, in gen
next_block = get_block()
^^^^^^^^^^^
File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/driver/transform.py", line 50, in get_block
column = col_type.read_column(source, num_rows, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/datatypes/base.py", line 143, in read_column
return self.read_column_data(source, num_rows, ctx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/datatypes/base.py", line 158, in read_column_data
column = self._read_column_binary(source, num_rows, ctx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/datatypes/string.py", line 34, in _read_column_binary
return source.read_str_col(num_rows, self._active_encoding(ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "clickhouse_connect/driverc/buffer.pyx", line 248, in clickhouse_connect.driverc.buffer.ResponseBuffer.read_str_col
File "clickhouse_connect/driverc/buffer.pyx", line 134, in clickhouse_connect.driverc.buffer.ResponseBuffer._read_str_col
File "clickhouse_connect/driverc/buffer.pyx", line 74, in clickhouse_connect.driverc.buffer.ResponseBuffer.read_bytes_c
File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/driver/httputil.py", line 200, in decompress
chunk = response.read(chunk_size, decode_content=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/progai/.venv/lib/python3.11/site-packages/urllib3/response.py", line 566, in read
with self._error_catcher():
File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
self.gen.throw(typ, value, traceback)
File "/progai/.venv/lib/python3.11/site-packages/urllib3/response.py", line 461, in _error_catcher
raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(134 bytes read)', IncompleteRead(134 bytes read))
Steps to reproduce
- start stream read
- consume iterator with some work/sleep
- soon it will crash with the error above
Expected behaviour
I expect it to read the whole dataset. If I disable processing, the dataset is read fine (40 mln records). This leads me to think that it is not related to actual data and response but something inside the implementation.
Code example
import clickhouse_connect
read_client = clickhouse_connect.get_client(
host=host, username=login, password=password, port=port,
connect_timeout=60 * 30, send_receive_timeout=60 * 30, client_name="airflow-read",
settings={
"session_timeout": 60 * 20,
}
)
qry = 'SELECT id, name FROM dev.tbl WHERE NOT empty(name)'
with read_client.query_row_block_stream(qry) as stream:
for block in stream:
rows = list(process(block, dataset))
clickhouse-connect and/or ClickHouse server logs
Configuration
Environment
- clickhouse-connect version: 0.7.19
- Python version: 3.11.10
- Operating system: Ubuntu 20.4
ClickHouse server
- ClickHouse Server version: 24.7.2 revision 54468
- ClickHouse Server non-default settings, if any:
-
CREATE TABLE
statements for tables involved: - Sample data for these tables, use clickhouse-obfuscator if necessary