clickhouse-connect icon indicating copy to clipboard operation
clickhouse-connect copied to clipboard

Streaming with query_row_block_stream crashes after few reads

Open keu opened this issue 5 months ago • 1 comments

Describe the bug

The streaming read is crashing after some time if there any processing in between reads.

Traceback (most recent call last):
  File "/usr/lib/python3.11/http/client.py", line 573, in _get_chunk_left
    chunk_left = self._read_next_chunk_size()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/http/client.py", line 540, in _read_next_chunk_size
    return int(line, 16)
           ^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 16: b'\xe9\xe9\x80\xe8\xf8\x8c\x05I"\xe7\x1bh\x9e\x0c\xa6\x06\xe3\x92\xa3?\x7f6:e\xec\x9d\xa8\x01d\xd3\x118{\xe1\xe9\xe60ta\x11T\xeb\xd0\x1d\x91\xed\xbf\x8aT5\xc8\x00\xceN\x1bA{\xe4\xf9\xeb\xd0\xc5\xba\xb
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.11/http/client.py", line 590, in _read_chunked
    chunk_left = self._get_chunk_left()
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/http/client.py", line 575, in _get_chunk_left
    raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/progai/.venv/lib/python3.11/site-packages/urllib3/response.py", line 444, in _error_catcher
    yield
  File "/progai/.venv/lib/python3.11/site-packages/urllib3/response.py", line 567, in read
    data = self._fp_read(amt) if not fp_closed else b""
           ^^^^^^^^^^^^^^^^^^
  File "/progai/.venv/lib/python3.11/site-packages/urllib3/response.py", line 533, in _fp_read
    return self._fp.read(amt) if amt is not None else self._fp.read()
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/http/client.py", line 467, in read
    return self._read_chunked(amt)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/http/client.py", line 605, in _read_chunked
    raise IncompleteRead(b''.join(value)) from exc
http.client.IncompleteRead: IncompleteRead(134 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/progai/pipeline/airflow/dags/shared/female_names_processing.py", line 76, in execute
    for i, block in enumerate(stream):
  File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/driver/common.py", line 201, in __next__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/driver/query.py", line 296, in _row_block_stream
    for block in self._column_block_stream():
  File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/driver/transform.py", line 75, in gen
    next_block = get_block()
                 ^^^^^^^^^^^
  File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/driver/transform.py", line 50, in get_block
    column = col_type.read_column(source, num_rows, context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/datatypes/base.py", line 143, in read_column
    return self.read_column_data(source, num_rows, ctx)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/datatypes/base.py", line 158, in read_column_data
    column = self._read_column_binary(source, num_rows, ctx)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/datatypes/string.py", line 34, in _read_column_binary
    return source.read_str_col(num_rows, self._active_encoding(ctx))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "clickhouse_connect/driverc/buffer.pyx", line 248, in clickhouse_connect.driverc.buffer.ResponseBuffer.read_str_col
  File "clickhouse_connect/driverc/buffer.pyx", line 134, in clickhouse_connect.driverc.buffer.ResponseBuffer._read_str_col
  File "clickhouse_connect/driverc/buffer.pyx", line 74, in clickhouse_connect.driverc.buffer.ResponseBuffer.read_bytes_c
  File "/progai/.venv/lib/python3.11/site-packages/clickhouse_connect/driver/httputil.py", line 200, in decompress
    chunk = response.read(chunk_size, decode_content=False)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/progai/.venv/lib/python3.11/site-packages/urllib3/response.py", line 566, in read
    with self._error_catcher():
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/progai/.venv/lib/python3.11/site-packages/urllib3/response.py", line 461, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(134 bytes read)', IncompleteRead(134 bytes read))

Steps to reproduce

  1. start stream read
  2. consume iterator with some work/sleep
  3. soon it will crash with the error above

Expected behaviour

I expect it to read the whole dataset. If I disable processing, the dataset is read fine (40 mln records). This leads me to think that it is not related to actual data and response but something inside the implementation.

Code example

import clickhouse_connect
read_client = clickhouse_connect.get_client(
            host=host, username=login, password=password, port=port,
            connect_timeout=60 * 30, send_receive_timeout=60 * 30, client_name="airflow-read",
            settings={
                "session_timeout": 60 * 20,
            }
        )
qry = 'SELECT id, name FROM dev.tbl WHERE NOT empty(name)'
with read_client.query_row_block_stream(qry) as stream:
    for block in stream:
        rows = list(process(block, dataset))

clickhouse-connect and/or ClickHouse server logs

Configuration

Environment

  • clickhouse-connect version: 0.7.19
  • Python version: 3.11.10
  • Operating system: Ubuntu 20.4

ClickHouse server

  • ClickHouse Server version: 24.7.2 revision 54468
  • ClickHouse Server non-default settings, if any:
  • CREATE TABLE statements for tables involved:
  • Sample data for these tables, use clickhouse-obfuscator if necessary

keu avatar Sep 24 '24 00:09 keu