clickhouse-connect icon indicating copy to clipboard operation
clickhouse-connect copied to clipboard

StreamFailureError when querying `JSON` containing `Tuple(Int8, String)`

Open martijnthe opened this issue 10 months ago • 3 comments

Describe the bug

A StreamFailureError is raised when executing a query that contains a column with JSON datatype, containing a Tuple(Int8, String).

Steps to reproduce

Example query:

SELECT '{"k": [123, "xyz"]}'::JSON SETTINGS input_format_json_read_numbers_as_strings = 0;

Apply this patch with unit test case and run the test:

diff --git a/tests/integration_tests/test_dynamic.py b/tests/integration_tests/test_dynamic.py
index bb8df62..f2a1821 100644
--- a/tests/integration_tests/test_dynamic.py
+++ b/tests/integration_tests/test_dynamic.py
@@ -162,3 +162,14 @@ def test_json_str_time(test_client: Client):
         pytest.skip('JSON string/numbers bug before 25.1, skipping')
     result = test_client.query("SELECT '{\"timerange\": \"2025-01-01T00:00:00+0000\"}'::JSON").result_set
     assert result[0][0]['timerange'] == datetime.datetime(2025, 1, 1)
+
+def test_json_mixed_array(test_client: Client):
+    type_available(test_client, 'json')
+    if not test_client.min_version('24.10'):
+        pytest.skip('Complex JSON broken before 24.10')
+
+    # Raises:
+    # clickhouse_connect.driver.exceptions.StreamFailureError: unrecognized data found in stream: `000000000000000101135475706c6528496e74382c20537472696e67290000000000000000017b0378797a0000000000000000`
+    result = test_client.query('SELECT \'{"k": [123, "xyz"]}\'::JSON SETTINGS input_format_json_read_numbers_as_strings = 0')
+    json1 = result.result_set[0][0]
+    assert json1 == {'k': [123, 'xyz']}
                   raise StreamFailureError(extract_error_message(source.last_message)) from None
E                   clickhouse_connect.driver.exceptions.StreamFailureError: unrecognized data found in stream: `000000000000000101135475706c6528496e74382c20537472696e67290000000000000000017b0378797a0000000000000000`

Expected behaviour

The above test should pass and no StreamFailureError should be raised.

Code example

See patch above.

clickhouse-connect and/or ClickHouse server logs

Configuration

Environment

  • clickhouse-connect version: master (8d88b1c67daa571e6be59a7951ad5256526121bb)
  • Python version: 3.10
  • Operating system: mac OS

ClickHouse server

  • ClickHouse Server version: 24.12

martijnthe avatar Feb 05 '25 17:02 martijnthe

Same problem for CH 25.1 And it's flaky: fails with StreamError approx in 10% of our test runs

pkit avatar Mar 11 '25 21:03 pkit

@pkit Is it the exact same problem, only occurring with Tuple(Int8, String) types with the same data, or does the data differ and some data blocks work and some do not?

I haven't had the chance to dig into this but I suspect a problem in the ClickHouse server with encoding column types in the Native block.

genzgd avatar Mar 11 '25 21:03 genzgd

@genzgd it is a tuple, but not (Int8, String) in our case the schema is:

key String,
value String,
tags Array(String),
tuple Tuple(String, Array(Float64)),
raw_data JSON,
other Nullable(String)

Yeah, it kinda looks like either CH server bug, or some changes in Native protocol.

pkit avatar Mar 11 '25 21:03 pkit