dlt icon indicating copy to clipboard operation
dlt copied to clipboard

Cluster support for clickhouse

Open pheepa opened this issue 11 months ago • 3 comments

Feature description

ALTER, CREATE, DROP queries in multiple nodes Clickhouse setup should include "on cluster <cluster_name>" clause to run query on all nodes in cluster. Now it seems that it has not been implemented.

Are you a dlt user?

I'd consider using dlt, but it's lacking a feature I need.

Use case

Correct work with multiple nodes setup

Proposed solution

on_cluster_mode and cluster_name added to clickhouse.credentials all ddl parameterised with "on cluster"

Related issues

https://github.com/dlt-hub/dlt/issues/1183

pheepa avatar Jan 09 '25 09:01 pheepa

same issue here.

OOub avatar Apr 09 '25 09:04 OOub

Hi, We have the same issue, we have sharded setup and additionaly to ON CLUSTER we also need to create a base and a distributed table pair to have the full data set available on all replicas. I have modified the execute_query function in the ClickHouse sql_client to modify queries generated by dlt to work with the distributed ClickHouse setup. What we do: CREATE TABLE:

  • add ON CLUSTER
  • add additional create statement so that we create the pair base table and distributed table DROP TABLE:
  • add ON CLUSTER
  • add additional statement to drop both table ALTER TABLE
  • add ON CLUSTER
  • add additional statement to alter both table DELETE FROM
  • adjust to delete from the base table

The base table would use ReplicatedMergeTree engine and the distributed table would use theDistributed engine and reference the base table

I also added additional configuration parameters distributed_tables = true cluster = "cluster_name" base_table_name_postfix = "base" base_table_database_prefix = ""

Only if distributed_tables is set to true and cluster is set, will the queries get adjusted. I'll create an PR soon. Main issue is, that the code cannot be tested without having Clickhouse setup with replicas and distributed tables. I can also copy the code here if anyone want's to try it out.

zstipanicev avatar Apr 29 '25 07:04 zstipanicev

Here is the PR if you wanna see the code changes

zstipanicev avatar Apr 29 '25 16:04 zstipanicev

Do you think we could add tests in the same way they are made in DBT-ClickHouse? For example, see https://github.com/clickhouse/dbt-clickhouse/tree/main/tests/integration.

pheepa avatar Apr 30 '25 10:04 pheepa

Do you think we could add tests in the same way they are made in DBT-ClickHouse? For example, see https://github.com/clickhouse/dbt-clickhouse/tree/main/tests/integration.

It looks like it should be possible, I can get a better answer next week from teammates. If I get any useful info I'll add it here

zstipanicev avatar Apr 30 '25 15:04 zstipanicev

Any idea when this will be included in a release?

OOub avatar Jun 18 '25 10:06 OOub

Any idea when this will be included in a release?

probably next one

adrianbr avatar Jun 19 '25 06:06 adrianbr

Any update of this patch ?

LPauzies avatar Nov 18 '25 09:11 LPauzies

implementation note: based on work done in two PRs linked to this issue we should consider this a separate destination. query rewrite looks massive and reuse for MERGE jobs may not be possible.

rudolfix avatar Nov 26 '25 09:11 rudolfix