clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

Deleting a CHI resource may leave debris from replicated tables in [Zoo]Keeper that requires later cleanup

Open hodgesrm opened this issue 1 year ago • 4 comments

If you delete a CHI cluster that has replicated tables, [Zoo]Keeper metadata is not cleaned up when the replica(s) are deleted. This leads to errors like the following if you then re-create the CHI resource and try to add tables again:

Received exception from server (version 23.7.4):
Code: 253. DB::Exception: Received from localhost:9000. DB::Exception: There was an error on [chi-demo2-s3-0-0:9000]: Code: 253. DB::Exception: Replica /clickhouse/s3/tables/0/default/test_local/replicas/chi-demo2-s3-0-0 already exists. (REPLICA_ALREADY_EXISTS) (version 23.7.4.5 (official build)). (REPLICA_ALREADY_EXISTS)
(query: CREATE TABLE IF NOT EXISTS test_local ON CLUSTER `{cluster}`

To duplicate this problem, follow the steps shown below.

  1. Create ClickHouse CHI using kubectl apply -f with proper connection to Keeper.
  2. Create at least one replica table. (See the example DDL below.)
  3. Delete the CHI using kubectl delete chi/<name>.

Now try to duplicate steps 1 and 2 again. This will fail due to existing ZooKeeper metadata.

The workaround is to remote the replica paths using SYSTEM DROP REPLICA. It's painful if there are many tables. Example:

SYSTEM DROP REPLICA 'chi-demo2-s3-0-0' FROM ZKPATH '/clickhouse/s3/tables/0/default/test_local'

Use the following DDL for step 2 above.

CREATE TABLE IF NOT EXISTS test_local ON CLUSTER `{cluster}`
(
    `A` Int64,
    `S` String,
    `D` Date
)
ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/test_local', '{replica}')
PARTITION BY D ORDER BY A;

hodgesrm avatar Apr 07 '24 20:04 hodgesrm

Note: this problem does not arise if you scale replicas down. In that case the operator properly deletes replica tables which ensures ZooKeeper cleanup.

hodgesrm avatar Apr 07 '24 20:04 hodgesrm

This is weird, operator deletes all replicated tables when deleting CHI. There s a regression test for that.

alex-zaitsev avatar Apr 24 '24 14:04 alex-zaitsev

It could be related to https://github.com/Altinity/clickhouse-operator/issues/1388 -- DROP TABLE may return fast, but ClickHouse may keep deleting data in the background. Probably SYNC may help here

alex-zaitsev avatar Apr 24 '24 14:04 alex-zaitsev

Fixed in 0.24.0

alex-zaitsev avatar Jun 13 '24 08:06 alex-zaitsev

Released in https://github.com/Altinity/clickhouse-operator/releases/tag/release-0.23.7

alex-zaitsev avatar Aug 12 '24 19:08 alex-zaitsev