totals don't work with new Clickhouse server versions
select sum(number), greater
from (select number, number > 5 as greater from numbers(10))
group by greater with totals
ClickHouse 22.8.19.10 / ClickHouse JDBC Driver 0.4.6 (revision: dd91e17) -- totals OK ClickHouse 23.3.5.9 / ClickHouse JDBC Driver 0.4.6 (revision: dd91e17) -- totals NOT OK
Thank you for reporting the issue. It appears that the problem is not limited to gRPC in the new version but also includes RowBinary or the method of generating the totals output.
22.8.15.23
00000000: 0201 7301 6706 5549 6e74 3634 0555 496e ..s.g.UInt64.UIn
00000010: 7438 0f00 0000 0000 0000 001e 0000 0000 t8..............
00000020: 0000 0001 2d00 0000 0000 0000 00 ....-........
23.3.6.7
00000000: 0201 7301 6706 5549 6e74 3634 0555 496e ..s.g.UInt64.UIn
00000010: 7438 0f00 0000 0000 0000 001e 0000 0000 t8..............
00000020: 0000 0001
@zhicwu Did you report this to Clickhouse?
I would throw an error, that totals are not supported.
Did you report this to Clickhouse?
No I didn't, as typically there is no response from anyone.
I would throw an error, that totals are not supported.
It's a bit difficult because 1) it requires SQL parsing to understand if the result should have totals or not; and 2) it's hard to tell if the server missed totals or not in RowBinary format(because it does not have a separator like the blank line added into TSV format).
I think it's good idea to be more defensive and add more validations on client side in case the server missed anything or protocol changed. Perhaps it would be beneficial to transition to the native data format, considering it as a first-class citizen.
It seems like totals is already detected https://github.com/search?q=repo%3AClickHouse%2Fclickhouse-java%20totals&type=code
I would throw an error always, no matter format.
to the native data format, considering it as a first-class citizen.
I don't like native, it requires much more compute power from the client. Some users will complain that JDBC driver needs 8GB RAM to process select results (to pivot very long native block). Go-driver has this issue, they just limited the max size of the native block in a result and reject to execute some queries at all.
It seems like totals is already detected
Yes, almost forgot that :p However, determining whether the last few rows in the RowBinary format represent totals or not can be challenging.
I don't like native, it requires much more compute power from the client.
Offloading serialization and deserialization to clients can be beneficial in certain scenarios, particularly for improving server scalability and handling higher request loads, particularly during ingestion processes. #1357 does not benefit RowBinary much but I believe it helps for Native.
@den-crane So looks like the problem solved on ClickHouse server side can we close it?
not solved yet, just like the title said, "not work with new Clickhouse server versions"
21.8
cl <<< 'select sum(1) with totals format RowBinary'|od
0000000 000001 000000 000000 000000 000001 000000 000000 000000
0000020
23.3.8
cl <<< 'select sum(1) with totals format RowBinary'|od
0000000 000001 000000 000000 000000
0000010
Old Clickhouse yielded totals in RowBinary.
At some point this bug was accidentally fixed
(AlexeyM told me that RowBinary should never had totals and never will be)
We should simply throw an error in JDBC, if someone executes a query with totals, for simplicity with any format: TSV, RowBinary, ....
They can use grouping sets nowadays.
According to the description of the document of ClickHouse, the above behavior change should be intentional.
https://clickhouse.com/docs/en/sql-reference/statements/select/group-by#with-totals-modifier
This extra row is only produced in
JSON*,TabSeparated*, andPretty*formats, separately from the other rows:
- In
XMLandJSON*formats, this row is output as a separate ‘totals’ field.- In
TabSeparated*,CSV*andVerticalformats, the row comes after the main result, preceded by an empty row (after the other data).- In
Pretty*formats, the row is output as a separate table after the main result.- In
Templateformat, the row is output according to specified template.- In the other formats it is not available.
But the reason for doing this is not stated in the document.
However, if this is reasonable, then the design of the JDBC Driver should also follow the above logic - that is, only parse the TOTALS data row from the SELECT return result with the specified FORMAT value.
So,
-
According to the design of
ClickHouse/clickhouse-java, its data format supportsTabSeparated, which meets the above requirements forFORMATvalue. https://github.com/ClickHouse/clickhouse-java#features -
According to the general experience of using the JDBC Driver, it is common for SQL execution results to contain multiple record sets - think about stored procedures and the
getMoreResultsmethod. -
Since
TabSeparateddoes not support as many data types asRowBinary, perhaps it would be better to import support for more complex formats such asTabSeparatedRawWithNamesAndTypesfor this scenario.
@zhicwu
yes, it was intentional change
- https://github.com/ClickHouse/ClickHouse/pull/41236
select sum(number), greater from (select number, number > 5 as greater from numbers(10)) group by greater with totalsClickHouse 22.8.19.10 / ClickHouse JDBC Driver 0.4.6 (revision: dd91e17) -- totals OK ClickHouse 23.3.5.9 / ClickHouse JDBC Driver 0.4.6 (revision: dd91e17) -- totals NOT OK
You need change your Driver to this one -------cc.blynk.clickhouse.ClickHouseDriver
ClickHouse 22.8.19.10 / ClickHouse JDBC Driver 0.4.6 (revision: dd91e17) -- totals OK ClickHouse 23.3.5.9 / ClickHouse JDBC
Thank you, I found another solution. I decided to retire and start growing cabbage.
This issue has been automatically marked as stale because it has not had activity in the last year. It will be closed in 30 days if no further activity occurs. Please feel free to leave a comment if you believe the issue is still relevant. Thank you for your contributions!