clickhouse-operator
clickhouse-operator copied to clipboard
Automatic recovery after complete data loss
The algorithm for manual recovery has already been described here https://kb.altinity.com/altinity-kb-setup-and-maintenance/recovery-after-complete-data-loss/
Expected behavior
- delete pvc and delete pod/statfulset (or any other or any other way to delete all data from volume )
- after new pod starts, the recovery process will immediately begin
At the moment I see that the operator creates databases (atomic engine too) and even distributed tables, but not ReplicatedMergeTree tables:
<Error> executeQuery: Code: 253. DB::Exception: Replica /clickhouse/tables/8b70dd70-8114-4237-8bdf-120fceb06ed0/shard0/replicas/chi-XXXX-s0r0 already exists. (REPLICA_IS_ALREADY_EXIST) (version 22.3.6.5 (official build))
Also the operator does not restore the statefulset https://github.com/Altinity/clickhouse-operator/issues/970
related https://github.com/Altinity/clickhouse-operator/issues/857
Seeing the same issue on my end. Database is created on the new instance but the table (ReplicatedMergeTree) is not, and the logs contain the same DB::Exception REPLICA_IS_ALREADY_EXIST
- I created ClickHouseInstallation with 1 shard, 3 replicas, pointing to 3 ClickHouse Keeper nodes
- I created the database and table as follows:
` CREATE DATABASE testdb ON CLUSTER '{cluster}'
CREATE TABLE IF NOT EXISTS testdb.testtable ON CLUSTER '{cluster}' ( id UUID, timestamp DateTime ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/testtable', '{replica}') PARTITION BY toYYYYMM(timestamp) ORDER BY (id, timestamp); `
- I reduced the replicas on the ClickHouseInstallation to 2
- After the third replica was removed I increased replicas back to 3
I expected that the new third replica would successfully have created the database and table, however only the database was created successfully. The table failed to create due to the REPLICA_IS_ALREADY_EXIST error.
Here is the complete log from the new third replica: chi-di-clickhouse-installation-replicated-0-2-0.log
@mlucic which clickhouse-operator version do you use?
@mlucic which clickhouse-operator version do you use?
0.18.5
REPLICA_IS_ALREADY_EXIST
operator version 0.19.0
REPLICA_IS_ALREADY_EXIST
operator version 0.19.0
I just upgraded to 0.19.0, can confirm that this issue is still present
@R-omk, restore process after loosing PV or PVC is fully implemented in 0.23.x
@R-omk, restore process after loosing PV or PVC is fully implemented in 0.23.x
operator (0.23.3)
-cannot detect sts remove and recreate
-cannot detect sts wrong scale (zero ) and restore scale
The conditions under which replica data is restored are maximally opaque. I believe that the operator should check the need for recovery every time a pod belonging to the cluster is created.