clickhouse-java icon indicating copy to clipboard operation
clickhouse-java copied to clipboard

Potential performance issue with SQL parsing in jdbc-v2

Open jploski opened this issue 3 months ago • 1 comments

Reporting against v.0.9.1. We are using ClickHouse JDBC via Hibernate - our code is calling createNativeQuery a lot - which means creating (and throwing away) a great number of PreparedStatements. As a result we're seeing in close to 6% of CPU time being spent in the driver's statement parsing with production loads (as evaluated with asprof), more specifically in the method org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState.

While I'm aware that this is partly a problem of our usage pattern (PreparedStatements should be cached, which is what we're going to address next at our end), we did not have this issue with the old version (0.2.6) of the driver. In short, the SQL parsing has become slow(er).

I'm writing this to alert you in case no performance benchmarks were performed for that area of code.

I'm also wondering why ANTLR is at all necessary in the driver - I'd assume the actual 'heavyweight' parsing to detect syntax errors and the like is done on the server anyhow, so why have such overheads in the client? But this is likely a deeper design issue not fixable by profiling...

jploski avatar Sep 12 '25 16:09 jploski

Good day, @jploski

Thank you for your feedback! We will look into the issue. There are cases when we have to parse SQL as if it is done by server. This is how JDBC is designed.

  • We need to know that parameter is in place of a value not somewhere else in SQL
  • We need to know if parameter is within function arguments and we have to use string representation instead of binary
  • We need to know if result set expected from query (it is not as simple as checking if query has SELECT)

In other words we have to parse SQL to get some information. Doing it with custom code became quickly unmaintainable and buggy. Parser like JavaCC, Antlr4 helps to keep parsing logic maintainable and understandable by community, it doesn't fall apart once a new condition should be added. Fixing bugs in parser much easier than finding bugs in tangle-hairy code.

As alternative we may implement a lightweight SQL parser in the future.

chernser avatar Sep 16 '25 16:09 chernser