metabase-clickhouse-driver icon indicating copy to clipboard operation
metabase-clickhouse-driver copied to clipboard

Query with same id is already running

Open mittalsuraj18 opened this issue 1 year ago • 8 comments

Describe the bug

We are sometimes seeing this error when loading charts on metabase via clickhouse. The error message is this

{"database_id":6,"started_at":"2023-10-04T14:28:57.262258Z","via":[{"status":"failed","class":"class clojure.lang.ExceptionInfo","error":"Error executing query: Code: 216. DB::Exception: Query with id = d0afc45c-1970-488b-b935-a2fb48e271d4 is already running. (QUERY_WITH_SAME_ID_IS_ALREADY_RUNNING) (version 22.8.13.20 (official build))\n, server ClickHouseNode [uri=https://<MASKED_HOST>:8443/default, options={connect_timeout=900000,socket_timeout=900000,dataTransferTimeout=900000,use_server_time_zone_for_dates=true,use_no_proxy=false,product_name=metabase/1.1.2}]@1823720274","stacktrace":["--> driver.sql_jdbc.execute$execute_reducible_query$fn__71852.invoke(execute.clj:505)","driver.sql_jdbc.execute$execute_reducible_query.invokeStatic(execute.clj:502)","driver.sql_jdbc.execute$execute_reducible_query.invoke(execute.clj:489)","driver.sql_jdbc.execute$execute_reducible_query.invokeStatic(execute.clj:497)","driver.sql_jdbc.execute$execute_reducible_query.invoke(execute.clj:489)","driver.sql_jdbc$fn__105186.invokeStatic(sql_jdbc.clj:63)","driver.sql_jdbc$fn__105186.invoke(sql_jdbc.clj:61)","query_processor.context$executef.invokeStatic(context.clj:60)","query_processor.context$executef.invoke(context.clj:49)"

Steps to reproduce

  1. Have a metabase dashboard with a lot of charts connected to clickhouse.
  2. Query these metabase charts via API / via UI. Sometimes in succession.

Expected behaviour

Query should succeed.

Environment

  • metabase-clickhouse-driver version:
  • metabase-clickhouse-driver configuration:
  • Metabase version:
  • OS:

ClickHouse server

  • ClickHouse Server version: 22.8.13.20
  • ClickHouse Server non-default settings, if any: None
  • CREATE TABLE statements for tables involved:
  • Sample data for all these tables, use clickhouse-obfuscator if necessary

mittalsuraj18 avatar Nov 30 '23 08:11 mittalsuraj18

Have a metabase dashboard with a lot of charts connected to clickhouse. Query these metabase charts via API / via UI. Sometimes in succession.

Can you please provide a more concrete scenario here?

slvrtrn avatar Dec 01 '23 13:12 slvrtrn

@slvrtrn So, the queries are long running queries. Some of them take more than 60 seconds to run. Basically have a metabase dashboard with long running queries. And open those dashboard in quick succession before the first query finishes. Or, query the metabase questions API for the same dashboard multiple times at once.

mittalsuraj18 avatar Dec 01 '23 14:12 mittalsuraj18

@slvrtrn Hello! I encountered the same problem while executing query: OPTIMIZE TABLE X PARTITION P FINAL DEDUPLICATE on the last 3 partitions. The number of records in them varies from 7 to 9 million. On average, one partition weighs 700 MB. Do u have some ideas, what could be causing this error? The workload for Clickhouse seems manageable. Table engine: ReplicatedReplasingMergeTree

Stasik371 avatar Jan 18 '24 16:01 Stasik371

@mzitnik, tagging you cause I think we need you here as you have more knowledge about the JDBC driver internals.

slvrtrn avatar Jan 18 '24 16:01 slvrtrn

Do we get a timeout from the driver? Is the query on the server finished? (We can see it in query_log)

mzitnik avatar Jan 30 '24 09:01 mzitnik

hi!I think this might answer your questions. Green rectangles are successful requests, red rectangles are the opposite. image

Stasik371 avatar Jan 30 '24 11:01 Stasik371

i also have this problem. when i running a long time sql question,it's come out error every time at 10min. where can i change my config?

jack-reso avatar Apr 11 '24 01:04 jack-reso

Good day, there is a similar issue for java client https://github.com/ClickHouse/clickhouse-java/issues/1529 I believe root cause is in multithreaded access. I have a tests that reproduces the problem. However fix would require some refactoring that will take time.

chernser avatar May 21 '24 21:05 chernser