[jdbc-v2] Impossible to SELECT duplicated data in 0.8.4
Describe the bug
After updating jdbc driver there is an error while trying to SELECT duplicated data.
Steps to reproduce
- "SELECT 'value1', 'value1'"
- Observe error
Expected behaviour
Correctly return duplicated data
Code example
Statement stmt = conn.createStatement();
stmt.executeQuery("""
SELECT 'value1', 'value1'
""");
ResultSet rs = stmt.getResultSet();
Error log
java.sql.SQLException: Multiple entries with same key: 'value1'=1 and 'value1'=0
at com.clickhouse.jdbc.internal.ExceptionUtils.toSqlState(ExceptionUtils.java:71)
at com.clickhouse.jdbc.internal.ExceptionUtils.toSqlState(ExceptionUtils.java:39)
at com.clickhouse.jdbc.StatementImpl.executeQuery(StatementImpl.java:207)
at com.clickhouse.jdbc.StatementImpl.executeQuery(StatementImpl.java:151)
at com.clickhouse.jdbc.PreparedStatementImpl.executeQuery(PreparedStatementImpl.java:110)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
at doobie.free.KleisliInterpreter$PreparedStatementInterpreter.$anonfun$executeQuery$2(kleisliinterpreter.scala:931)
at doobie.free.KleisliInterpreter.$anonfun$primitive$2(kleisliinterpreter.scala:104)
at flatMap @ doobie.WeakAsync$$anon$1.flatMap(WeakAsync.scala:30)
at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:169)
at flatMap @ doobie.WeakAsync$$anon$1.flatMap(WeakAsync.scala:30)
at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:169)
at uncancelable @ doobie.WeakAsync$$anon$1.uncancelable(WeakAsync.scala:38)
at flatMap @ doobie.WeakAsync$$anon$1.flatMap(WeakAsync.scala:30)
at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:169)
at flatMap @ doobie.WeakAsync$$anon$1.flatMap(WeakAsync.scala:30)
at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:169)
at suspend @ doobie.WeakAsync$$anon$1.suspend(WeakAsync.scala:36)
at flatMap @ doobie.WeakAsync$$anon$1.flatMap(WeakAsync.scala:30)
at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:169)
at flatMap @ doobie.WeakAsync$$anon$1.flatMap(WeakAsync.scala:30)
at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:169)
Caused by: java.lang.IllegalArgumentException: Multiple entries with same key: 'value1'=1 and 'value1'=0
at com.google.common.collect.ImmutableMap.conflictException(ImmutableMap.java:382)
at com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:376)
at com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:246)
at com.google.common.collect.RegularImmutableMap.fromEntryArrayCheckingBucketOverflow(RegularImmutableMap.java:134)
at com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:96)
at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:579)
at com.google.common.collect.ImmutableMap$Builder.buildOrThrow(ImmutableMap.java:607)
at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:594)
at com.clickhouse.client.api.metadata.TableSchema.<init>(TableSchema.java:41)
at com.clickhouse.client.api.metadata.TableSchema.<init>(TableSchema.java:26)
at com.clickhouse.client.api.data_formats.RowBinaryWithNamesAndTypesFormatReader.readSchema(RowBinaryWithNamesAndTypesFormatReader.java:45)
at com.clickhouse.client.api.data_formats.RowBinaryWithNamesAndTypesFormatReader.<init>(RowBinaryWithNamesAndTypesFormatReader.java:23)
at com.clickhouse.client.api.Client.newBinaryFormatReader(Client.java:2086)
at com.clickhouse.client.api.Client.newBinaryFormatReader(Client.java:2104)
at com.clickhouse.jdbc.StatementImpl.executeQuery(StatementImpl.java:202)
at com.clickhouse.jdbc.StatementImpl.executeQuery(StatementImpl.java:151)
at com.clickhouse.jdbc.PreparedStatementImpl.executeQuery(PreparedStatementImpl.java:110)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
at doobie.free.KleisliInterpreter$PreparedStatementInterpreter.$anonfun$executeQuery$2(kleisliinterpreter.scala:931)
at doobie.free.KleisliInterpreter.$anonfun$primitive$2(kleisliinterpreter.scala:104)
at cats.effect.IOFiber.runLoop(IOFiber.scala:1004)
at cats.effect.IOFiber.execR(IOFiber.scala:1362)
at cats.effect.IOFiber.run(IOFiber.scala:112)
at cats.effect.unsafe.WorkerThread.run(WorkerThread.scala:702)
Configuration
"com.clickhouse" % "clickhouse-jdbc" % "0.8.4"
ClickHouse server
ClickHouse version I use is: ver. 24.3.12.76
Correctly return duplicated data
What is the expected behavior for the driver when calling ResultSet.getObject("value1") in this case?
We have several options, as it seems:
- Override and return the last value for duplicated keys
- Return all values as an array of the original type; e.g.,
["value1", "value1"] - Leave it as is if this is not a common scenario, and it can be addressed on the application side by adding a troubleshooting documentation entry.
I'm not sure what is the best way to handle that but I have a case in code when for simplicity I'm reading the same table column twice and it triggers this error. It would be nice if you could make it possible in newest driver as it was in older one. If not then let me know because obviously I can handle that manually on application side.
@chernser @Paultagoras @mzitnik WDYT?
Hi, could someone please comment on that?
@Onajk thank you for reporting it! Would it be a workaround for you to assign an alias to duplicate columns?
This should be easy to fix because we use this map to find column by name and can remember only first occurrence of it.
This should be a behavior for the example:
- getInt("value1") - returns always first column (if there are multiple columns it will return most left column of duplicates)
- getInt(1) - will return the first column
- getInt(2) - will return the second column
Would it work for you, @Onajk?
Thanks!
I'm not sure how that will behave in Doobie that I use in Scala code but let's try
Using alias fixes the problem and I guess we can live with it
@Onajk thank you for updating! I will keep the issue open so we fix it anyway.
While the example above is a good MRE, let me add a more realistic use-case (using the JDBC driver in Metabase) where this regression has actually impeded our work:
In order to do quick exploration of data and iteratively build queries, our users sometimes run queries like: select col1, * from table. Then they check the remaining columns and add another one they're interested in, e.g. select col1, col2, * from table. When they found all the columns they're interested in, they eventually remove the * to remove the unnecessary toil. While these queries work perfectly fine in plain SQL, the columns duplicated by the * trigger the above error in the JDBC driver.
This change introduced a regression in jOOQ's code generator after upgrading:
- https://github.com/jOOQ/jOOQ/issues/19026
While it is easy to avoid the issue by aliasing individual columns, it's still important to note:
- This is a regression. Queries that used to work no longer do.
- This is not at all a problem on the server side: https://play.clickhouse.com/play?user=play#c2VsZWN0IDEgYSwgMSBh. It's always risky for JDBC drivers to parse SQL, but if they do, they should absolutely behave just like the server
- I'm not aware of many RDBMS that don't allow duplicate column names in top-level projections. They don't (usually) allow this in nested selects, such as derived tables, but there's no reason not to allow it in top-level SELECTs. If there's an ambiguity issue, a synthetic (internal) column name can be assigned, such as
___col1,___col2, you get the idea. There are other ways of name mangling, but it should always be possible. - The duality of name based and position based column references is natural to SQL. The reason why two identical columns were projected in jOOQ is simply because that particular query had to conform to a positional based tuple type, where column names are irrelevant.
For the record, the server accepts:
select 1 a, 1 a
But rejects:
select 1 a, 2 a
With:
Code: 179. DB::Exception: Multiple expressions 2 AS a and 1 AS a for alias a. In scope SELECT 1 AS a, 2 AS a. (MULTIPLE_EXPRESSIONS_FOR_ALIAS) (version 25.8.1.4953 (official build))
I think this is another, related problem. I don't see why this is forbidden.
@lukaseder we are working to solve the issue.