clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

[RFC] Deployment Scheme: One Keeper for Each ClickHouse Shard

Open ardenwick opened this issue 9 months ago • 3 comments

In typical ClickHouse + Keeper setups, theres one 'default' Keeper cluster, consisting of 3 or 5 keeper servers, which serves for ClickHouse DDL task queue.

Replicated MergeTree table engines stores, typically large amount of, metadata in Keeper for data replication. This load can be put onto 'default' keeper cluster for simplicity. With more replicated tables added and ClickHouse cluster becoming larger and larger, auxiliary Keeper clusters should be added to hold the extra load, leaving 'default' Keeper cluster to focus on DDL queue only.

The problems here are:

  • One replicated table does not care about (neither read nor write) metadata of the others.
  • One shard of replicated table does not care about (neither read nor write) metadata of the others.

Which translate to bottlenecks:

  • Keeper requests from one replicated table has to wait for in-fly requests[^1] from the others to finish.
  • Keeper requests from one shard of replicated table has to wait for requests from the others to finish.

So we can deploy a single standalone Keeper server for each ClickHouse shard, in addition to the 'default' Keeper cluster.

For example for a ClickHouse cluster consisting of 5 shards, we need 5 shard-Keeper servers, plus 3 ddl-Keeper servers, 8 Keeper servers in total.

This comes with some benefits:

  • Shard-Keepers are standalone servers, which saves resources compared to Keeper clusters
  • Shard-Keepers does not need a quorum to work. The example ClickHouse cluster is still partially writable even if 4 out of 5 shard-Keepers are down. Whereas a 5-quorom Keeper cluster does not endure such quorum lost.
  • Promising unleashed Keeper perfomance.

[^1]:ClickHouse Metric ZooKeeperRequest

ardenwick avatar Jun 10 '25 07:06 ardenwick

Example CREATE TABLE query

CREATE TABLE a
(
    `i` UInt64
)
ENGINE = ReplicatedMergeTree('shard-keeper-{shard}:/clickhouse/tables/{database}/{table}', '{replica}')
ORDER BY i

ardenwick avatar Jun 10 '25 08:06 ardenwick

Downside: Not applicable to SharedMergeTree.

ardenwick avatar Jun 10 '25 08:06 ardenwick

Good idea, thanks!

alex-zaitsev avatar Jun 12 '25 12:06 alex-zaitsev