kubeblocks [BUG] Cluster creation & OpsRequest Reconfiguring races when PVC provisioning delays first Pod start (MySQL)

Describe the bug
Applying a MySQL Cluster and an OpsRequest (type: Reconfiguring with at least one restart-required parameter) in the same apply for new clusters leads to a crashloop/broken cluster when PVC provisioning delays the first Pod start. The OpsRequest is queued and processed by the operator before the MySQL cluster has completed its first boot. When the volume is finally provisioned and the Pod starts, the already-processed OpsRequest immediately triggers the restart-required reconfigure (e.g., innodb_buffer_pool_instances), and the component fails to complete initial bootstrap reliably.

To Reproduce

Apply the following at once (single kubectl apply -f), using a storage class that takes a few seconds to provision a PVC:

---
kind: Namespace
apiVersion: v1
metadata:
  name: kubeblocks-test
---
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: cluster1
  namespace: kubeblocks-test
spec:
  clusterDef: mysql
  topology: semisync
  terminationPolicy: Delete
  componentSpecs:
    - name: mysql
      componentDef: "mysql-8.0"
      serviceVersion: 8.0.33
      replicas: 1
      volumeClaimTemplates:
        - name: data
          spec:
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 10Gi
---
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: mysql-reconfiguring
  namespace: kubeblocks-test
spec:
  clusterName: cluster1
  force: false
  reconfigures:
    - componentName: mysql
      parameters:
        - key: innodb_buffer_pool_instances
          value: "5"
  preConditionDeadlineSeconds: 60
  type: Reconfiguring

Observe: PVC provisioning keeps the Pod at Pending; the OpsRequest is processed and ready to execute before the Pod exists.
When the Pod finally starts, the restart-required reconfigure is executed immediately (before first-boot completes), and the component fails to finish initialization / enters restart loops.

Expected behavior
The OpsRequest should not be processed until the MySQL Pod is running and all init containers have completed; applying Cluster + OpsRequest together for new clusters should be safe for GitOps workflows even when PVC provisioning is slow.

Additional context

Kubernetes: 1.33.5+k3s1
KubeBlocks: v1.0.1
MySQL add-on: 1.0.3
Storage class / CSI: hetzner-csi

Does not happen if

The OpsRequest is applied after the Cluster successfully bootstraps (all init containers successfully exit).
The Cluster has no volumeClaimTemplates (Pod starts quickly).

Oct 08 '25 19:10 elderapo

Hi @elderapo

Reconfiguration is a special ops that can be executed when cluster is running/updating/abornaml/failed. OpsRequests are designed as one-time action.

Oct 09 '25 03:10 shanshanying

Hi @elderapo And you can set preConditionDeadlineSeconds to delay the execution of operations until the cluster is running.

Oct 09 '25 06:10 wangyelei

Hi @shanshanying, I understand, but applying OpsRequest with type: Reconfigure should wait for the cluster to be in a state that can accept the reconfiguration. If applying when the cluster is in the creation state, it breaks it (I believe because it interrupts the first init containers setup, and this process never recovers); the cluster ends up unusable in an infinite crash loop.

Hi @wangyelei, in the above example, I've already used preConditionDeadlineSeconds. Without it, the ops would fail right away after being applied; setting it to 60 causes it to wait for mysql Pod to become running, but interrupts its init containers (that do some bootstraping job, I believe), which results in a crashed cluster that, after a restart (caused by OpsRequest) ends up in the restart loop.

Oct 09 '25 08:10 elderapo

these are the four cluster phases Reconfiguration Ops can be applie: running/updating/abornaml/failed (did the cluster status goes from creating to updating? otherwise the ops will still wait in Pending). It is recommended that application layer should control when to apply the reconfiguration.

Oct 09 '25 10:10 shanshanying

It seems that the cluster goes from Creating => Running right after the containers in Pod start; but before the init containers finish. Because of that, during init container run, the OpsRequest causes the pod to restart, interrupting init jobs. I think it would be fixed if Cluster transitioned from Creating => Running only when:

pod started
Init containers in the pod finish their work

Oct 09 '25 10:10 elderapo

failed to reproduce the case. But in KB a cluster is running only when all pod are running ( and pods must be running roles for mysql clusters). It would be helpful if you can provide in detail hwo to reproduce the case when pods are not init-ed but cluster is running.

Oct 09 '25 11:10 shanshanying

What CSI did you use to provision the PVC? In my case, it's Hetzner CSI, which takes like 5-15 seconds to provision and bind the volume.

Oct 09 '25 12:10 elderapo

i used ebd and binding mode is WAITFORFIRSTCONSUMER.

Oct 17 '25 04:10 shanshanying

This issue has been marked as stale because it has been open for 30 days with no activity

Nov 17 '25 00:11 github-actions[bot]

kubeblocks kubeblocks copied to clipboard

[BUG] Cluster creation & OpsRequest Reconfiguring races when PVC provisioning delays first Pod start (MySQL)

kubeblocks
kubeblocks copied to clipboard