StreamFailureError when querying `JSON` containing `Tuple(Int8, String)`
Describe the bug
A StreamFailureError is raised when executing a query that contains a column with JSON datatype, containing a Tuple(Int8, String).
Steps to reproduce
Example query:
SELECT '{"k": [123, "xyz"]}'::JSON SETTINGS input_format_json_read_numbers_as_strings = 0;
Apply this patch with unit test case and run the test:
diff --git a/tests/integration_tests/test_dynamic.py b/tests/integration_tests/test_dynamic.py
index bb8df62..f2a1821 100644
--- a/tests/integration_tests/test_dynamic.py
+++ b/tests/integration_tests/test_dynamic.py
@@ -162,3 +162,14 @@ def test_json_str_time(test_client: Client):
pytest.skip('JSON string/numbers bug before 25.1, skipping')
result = test_client.query("SELECT '{\"timerange\": \"2025-01-01T00:00:00+0000\"}'::JSON").result_set
assert result[0][0]['timerange'] == datetime.datetime(2025, 1, 1)
+
+def test_json_mixed_array(test_client: Client):
+ type_available(test_client, 'json')
+ if not test_client.min_version('24.10'):
+ pytest.skip('Complex JSON broken before 24.10')
+
+ # Raises:
+ # clickhouse_connect.driver.exceptions.StreamFailureError: unrecognized data found in stream: `000000000000000101135475706c6528496e74382c20537472696e67290000000000000000017b0378797a0000000000000000`
+ result = test_client.query('SELECT \'{"k": [123, "xyz"]}\'::JSON SETTINGS input_format_json_read_numbers_as_strings = 0')
+ json1 = result.result_set[0][0]
+ assert json1 == {'k': [123, 'xyz']}
raise StreamFailureError(extract_error_message(source.last_message)) from None
E clickhouse_connect.driver.exceptions.StreamFailureError: unrecognized data found in stream: `000000000000000101135475706c6528496e74382c20537472696e67290000000000000000017b0378797a0000000000000000`
Expected behaviour
The above test should pass and no StreamFailureError should be raised.
Code example
See patch above.
clickhouse-connect and/or ClickHouse server logs
Configuration
Environment
- clickhouse-connect version: master (8d88b1c67daa571e6be59a7951ad5256526121bb)
- Python version: 3.10
- Operating system: mac OS
ClickHouse server
- ClickHouse Server version: 24.12
Same problem for CH 25.1 And it's flaky: fails with StreamError approx in 10% of our test runs
@pkit Is it the exact same problem, only occurring with Tuple(Int8, String) types with the same data, or does the data differ and some data blocks work and some do not?
I haven't had the chance to dig into this but I suspect a problem in the ClickHouse server with encoding column types in the Native block.
@genzgd it is a tuple, but not (Int8, String) in our case the schema is:
key String,
value String,
tags Array(String),
tuple Tuple(String, Array(Float64)),
raw_data JSON,
other Nullable(String)
Yeah, it kinda looks like either CH server bug, or some changes in Native protocol.