clickhouse-java icon indicating copy to clipboard operation
clickhouse-java copied to clipboard

[jdbc-v2] Impossible to SELECT duplicated data in 0.8.4

Open Onajk opened this issue 8 months ago • 9 comments

Describe the bug

After updating jdbc driver there is an error while trying to SELECT duplicated data.

Steps to reproduce

  1. "SELECT 'value1', 'value1'"
  2. Observe error

Expected behaviour

Correctly return duplicated data

Code example

Statement stmt = conn.createStatement();
stmt.executeQuery("""
        SELECT 'value1', 'value1'
        """);
ResultSet rs = stmt.getResultSet();

Error log

java.sql.SQLException: Multiple entries with same key: 'value1'=1 and 'value1'=0
	at com.clickhouse.jdbc.internal.ExceptionUtils.toSqlState(ExceptionUtils.java:71)
	at com.clickhouse.jdbc.internal.ExceptionUtils.toSqlState(ExceptionUtils.java:39)
	at com.clickhouse.jdbc.StatementImpl.executeQuery(StatementImpl.java:207)
	at com.clickhouse.jdbc.StatementImpl.executeQuery(StatementImpl.java:151)
	at com.clickhouse.jdbc.PreparedStatementImpl.executeQuery(PreparedStatementImpl.java:110)
	at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
	at doobie.free.KleisliInterpreter$PreparedStatementInterpreter.$anonfun$executeQuery$2(kleisliinterpreter.scala:931)
	at doobie.free.KleisliInterpreter.$anonfun$primitive$2(kleisliinterpreter.scala:104)
	at flatMap @ doobie.WeakAsync$$anon$1.flatMap(WeakAsync.scala:30)
	at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:169)
	at flatMap @ doobie.WeakAsync$$anon$1.flatMap(WeakAsync.scala:30)
	at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:169)
	at uncancelable @ doobie.WeakAsync$$anon$1.uncancelable(WeakAsync.scala:38)
	at flatMap @ doobie.WeakAsync$$anon$1.flatMap(WeakAsync.scala:30)
	at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:169)
	at flatMap @ doobie.WeakAsync$$anon$1.flatMap(WeakAsync.scala:30)
	at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:169)
	at suspend @ doobie.WeakAsync$$anon$1.suspend(WeakAsync.scala:36)
	at flatMap @ doobie.WeakAsync$$anon$1.flatMap(WeakAsync.scala:30)
	at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:169)
	at flatMap @ doobie.WeakAsync$$anon$1.flatMap(WeakAsync.scala:30)
	at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:169)
Caused by: java.lang.IllegalArgumentException: Multiple entries with same key: 'value1'=1 and 'value1'=0
	at com.google.common.collect.ImmutableMap.conflictException(ImmutableMap.java:382)
	at com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:376)
	at com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:246)
	at com.google.common.collect.RegularImmutableMap.fromEntryArrayCheckingBucketOverflow(RegularImmutableMap.java:134)
	at com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:96)
	at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:579)
	at com.google.common.collect.ImmutableMap$Builder.buildOrThrow(ImmutableMap.java:607)
	at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:594)
	at com.clickhouse.client.api.metadata.TableSchema.<init>(TableSchema.java:41)
	at com.clickhouse.client.api.metadata.TableSchema.<init>(TableSchema.java:26)
	at com.clickhouse.client.api.data_formats.RowBinaryWithNamesAndTypesFormatReader.readSchema(RowBinaryWithNamesAndTypesFormatReader.java:45)
	at com.clickhouse.client.api.data_formats.RowBinaryWithNamesAndTypesFormatReader.<init>(RowBinaryWithNamesAndTypesFormatReader.java:23)
	at com.clickhouse.client.api.Client.newBinaryFormatReader(Client.java:2086)
	at com.clickhouse.client.api.Client.newBinaryFormatReader(Client.java:2104)
	at com.clickhouse.jdbc.StatementImpl.executeQuery(StatementImpl.java:202)
	at com.clickhouse.jdbc.StatementImpl.executeQuery(StatementImpl.java:151)
	at com.clickhouse.jdbc.PreparedStatementImpl.executeQuery(PreparedStatementImpl.java:110)
	at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
	at doobie.free.KleisliInterpreter$PreparedStatementInterpreter.$anonfun$executeQuery$2(kleisliinterpreter.scala:931)
	at doobie.free.KleisliInterpreter.$anonfun$primitive$2(kleisliinterpreter.scala:104)
	at cats.effect.IOFiber.runLoop(IOFiber.scala:1004)
	at cats.effect.IOFiber.execR(IOFiber.scala:1362)
	at cats.effect.IOFiber.run(IOFiber.scala:112)
	at cats.effect.unsafe.WorkerThread.run(WorkerThread.scala:702)

Configuration

"com.clickhouse" % "clickhouse-jdbc" % "0.8.4"

ClickHouse server

ClickHouse version I use is: ver. 24.3.12.76

Onajk avatar Apr 24 '25 10:04 Onajk

Correctly return duplicated data

What is the expected behavior for the driver when calling ResultSet.getObject("value1") in this case?

We have several options, as it seems:

  • Override and return the last value for duplicated keys
  • Return all values as an array of the original type; e.g., ["value1", "value1"]
  • Leave it as is if this is not a common scenario, and it can be addressed on the application side by adding a troubleshooting documentation entry.

slvrtrn avatar Apr 30 '25 10:04 slvrtrn

I'm not sure what is the best way to handle that but I have a case in code when for simplicity I'm reading the same table column twice and it triggers this error. It would be nice if you could make it possible in newest driver as it was in older one. If not then let me know because obviously I can handle that manually on application side.

Onajk avatar Apr 30 '25 10:04 Onajk

@chernser @Paultagoras @mzitnik WDYT?

slvrtrn avatar Apr 30 '25 12:04 slvrtrn

Hi, could someone please comment on that?

Onajk avatar May 12 '25 13:05 Onajk

@Onajk thank you for reporting it! Would it be a workaround for you to assign an alias to duplicate columns?

This should be easy to fix because we use this map to find column by name and can remember only first occurrence of it.

This should be a behavior for the example:

  1. getInt("value1") - returns always first column (if there are multiple columns it will return most left column of duplicates)
  2. getInt(1) - will return the first column
  3. getInt(2) - will return the second column

Would it work for you, @Onajk?

Thanks!

chernser avatar May 12 '25 15:05 chernser

I'm not sure how that will behave in Doobie that I use in Scala code but let's try

Onajk avatar May 13 '25 11:05 Onajk

Using alias fixes the problem and I guess we can live with it

Onajk avatar May 19 '25 07:05 Onajk

@Onajk thank you for updating! I will keep the issue open so we fix it anyway.

chernser avatar May 19 '25 19:05 chernser

While the example above is a good MRE, let me add a more realistic use-case (using the JDBC driver in Metabase) where this regression has actually impeded our work: In order to do quick exploration of data and iteratively build queries, our users sometimes run queries like: select col1, * from table. Then they check the remaining columns and add another one they're interested in, e.g. select col1, col2, * from table. When they found all the columns they're interested in, they eventually remove the * to remove the unnecessary toil. While these queries work perfectly fine in plain SQL, the columns duplicated by the * trigger the above error in the JDBC driver.

Gerrit-K avatar Jun 04 '25 12:06 Gerrit-K

This change introduced a regression in jOOQ's code generator after upgrading:

  • https://github.com/jOOQ/jOOQ/issues/19026

While it is easy to avoid the issue by aliasing individual columns, it's still important to note:

  • This is a regression. Queries that used to work no longer do.
  • This is not at all a problem on the server side: https://play.clickhouse.com/play?user=play#c2VsZWN0IDEgYSwgMSBh. It's always risky for JDBC drivers to parse SQL, but if they do, they should absolutely behave just like the server
  • I'm not aware of many RDBMS that don't allow duplicate column names in top-level projections. They don't (usually) allow this in nested selects, such as derived tables, but there's no reason not to allow it in top-level SELECTs. If there's an ambiguity issue, a synthetic (internal) column name can be assigned, such as ___col1, ___col2, you get the idea. There are other ways of name mangling, but it should always be possible.
  • The duality of name based and position based column references is natural to SQL. The reason why two identical columns were projected in jOOQ is simply because that particular query had to conform to a positional based tuple type, where column names are irrelevant.

lukaseder avatar Sep 09 '25 09:09 lukaseder

For the record, the server accepts:

select 1 a, 1 a

But rejects:

select 1 a, 2 a

With:

Code: 179. DB::Exception: Multiple expressions 2 AS a and 1 AS a for alias a. In scope SELECT 1 AS a, 2 AS a. (MULTIPLE_EXPRESSIONS_FOR_ALIAS) (version 25.8.1.4953 (official build))

I think this is another, related problem. I don't see why this is forbidden.

lukaseder avatar Sep 09 '25 09:09 lukaseder

@lukaseder we are working to solve the issue.

chernser avatar Sep 09 '25 15:09 chernser