clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

Operator does not recreate Databases and Tables on replicas if not scaled to 0 before deleting the StatefulSet

Open kaitimmer opened this issue 6 months ago • 0 comments

We use release-0.23.7 to manage our Clickhouse instances.

Currently working on migrating all Clickhouse servers to a different storage class. This is what we do:

  1. Change the storageClass and PVC size (making sure it is large enough for the data in the replica) in the ClickhouseInstallation
  2. Delete the replicas StatefulSet and PVC and PV
  3. Have the Operator recreate it

Usually, this works fine. The replica is created on a new PVC with the new settings, and the data is synced back from the remaining replicas. We do this one after another until we are entirely running on the new storage.

However, this never works for replica 0 (I've also seen it for replicas >0, but only sometimes). The new StatefulSet is created with a new PVC, but the database and tables are not created and, therefore, are not synced.

In these cases, the Operator somehow ends up here: https://github.com/Altinity/clickhouse-operator/blob/d5f265fb4773ec4622b2af13fa52858c9f1e8c15/pkg/controller/chi/worker.go#L909 but we do not understand how this happens.

What works in these scenarios is this:

  1. Scale the Operator to 0
  2. Delete the StatefulSet and PVC/PV
  3. Restart the Operator

In this case, the operator reconciles the installation fine and creates the database and tables.

kaitimmer avatar Sep 02 '24 09:09 kaitimmer