Endless loop of reconciliation
Describe the bug
After getting a cluster running at long last (OpenShift woes), it seems to be stuck in some kind of endless reconciliation loop:
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"logger": "cluster-manager.myapp/myapp",
"msg": "start operation",
"operationId": "op-wbdjt",
"origin": "81293e20-0258-4ddf-83e0-d81e2c3dd4ed"
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"msg": "Updated partition",
"controller": "statefulset",
"controllerGroup": "apps",
"controllerKind": "StatefulSet",
"StatefulSet": {
"name": "moco-myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "moco-myapp",
"reconcileID": "32721860-8378-4a82-a05d-24e19fe78c18",
"newPartition": 2,
"oldPartition": 3
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"msg": "partition is updated",
"controller": "statefulset",
"controllerGroup": "apps",
"controllerKind": "StatefulSet",
"StatefulSet": {
"name": "moco-myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "moco-myapp",
"reconcileID": "32721860-8378-4a82-a05d-24e19fe78c18"
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"msg": "Updated partition",
"controller": "statefulset",
"controllerGroup": "apps",
"controllerKind": "StatefulSet",
"StatefulSet": {
"name": "moco-myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "moco-myapp",
"reconcileID": "6b9c0377-f70e-456d-82f4-22e7e414de73",
"newPartition": 1,
"oldPartition": 2
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"msg": "partition is updated",
"controller": "statefulset",
"controllerGroup": "apps",
"controllerKind": "StatefulSet",
"StatefulSet": {
"name": "moco-myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "moco-myapp",
"reconcileID": "6b9c0377-f70e-456d-82f4-22e7e414de73"
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"msg": "Updated partition",
"controller": "statefulset",
"controllerGroup": "apps",
"controllerKind": "StatefulSet",
"StatefulSet": {
"name": "moco-myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "moco-myapp",
"reconcileID": "a41c7399-b9f8-4a53-9e62-e030e9d1ea4e",
"newPartition": 0,
"oldPartition": 1
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"msg": "partition is updated",
"controller": "statefulset",
"controllerGroup": "apps",
"controllerKind": "StatefulSet",
"StatefulSet": {
"name": "moco-myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "moco-myapp",
"reconcileID": "a41c7399-b9f8-4a53-9e62-e030e9d1ea4e"
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"logger": "cluster-manager.myapp/myapp",
"msg": "cluster state is Healthy",
"operationId": "op-wbdjt"
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"logger": "cluster-manager.myapp/myapp",
"msg": "finish",
"operationId": "op-wbdjt",
"duration": 0.114975796
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"msg": "reconciled StatefulSet",
"controller": "mysqlcluster",
"controllerGroup": "moco.cybozu.com",
"controllerKind": "MySQLCluster",
"MySQLCluster": {
"name": "myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "myapp",
"reconcileID": "d3640c7a-d4bf-4408-9a8c-e0aad62e7758",
"statefulSetName": "moco-myapp"
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"msg": "reconciled CronJob for backup",
"controller": "mysqlcluster",
"controllerGroup": "moco.cybozu.com",
"controllerKind": "MySQLCluster",
"MySQLCluster": {
"name": "myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "myapp",
"reconcileID": "d3640c7a-d4bf-4408-9a8c-e0aad62e7758",
"cronJobName": "moco-backup-myapp"
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"logger": "cluster-manager.myapp/myapp",
"msg": "start operation",
"operationId": "op-fn89n",
"origin": "d3640c7a-d4bf-4408-9a8c-e0aad62e7758"
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"msg": "update status successfully",
"controller": "mysqlcluster",
"controllerGroup": "moco.cybozu.com",
"controllerKind": "MySQLCluster",
"MySQLCluster": {
"name": "myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "myapp",
"reconcileID": "d3640c7a-d4bf-4408-9a8c-e0aad62e7758"
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"msg": "Updated partition",
"controller": "statefulset",
"controllerGroup": "apps",
"controllerKind": "StatefulSet",
"StatefulSet": {
"name": "moco-myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "moco-myapp",
"reconcileID": "bbab67e9-2011-4481-bd7c-ecc0a433d2e8",
"newPartition": 2,
"oldPartition": 3
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"msg": "partition is updated",
"controller": "statefulset",
"controllerGroup": "apps",
"controllerKind": "StatefulSet",
"StatefulSet": {
"name": "moco-myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "moco-myapp",
"reconcileID": "bbab67e9-2011-4481-bd7c-ecc0a433d2e8"
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"msg": "Updated partition",
"controller": "statefulset",
"controllerGroup": "apps",
"controllerKind": "StatefulSet",
"StatefulSet": {
"name": "moco-myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "moco-myapp",
"reconcileID": "ff21a7d5-0c90-4ddf-b98e-0b3996c3c5a2",
"newPartition": 1,
"oldPartition": 2
}
{
"level": "info",
"ts": "2025-02-20T19:41:42Z",
"msg": "partition is updated",
"controller": "statefulset",
"controllerGroup": "apps",
"controllerKind": "StatefulSet",
"StatefulSet": {
"name": "moco-myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "moco-myapp",
"reconcileID": "ff21a7d5-0c90-4ddf-b98e-0b3996c3c5a2"
}
{
"level": "info",
"ts": "2025-02-20T19:41:43Z",
"msg": "Updated partition",
"controller": "statefulset",
"controllerGroup": "apps",
"controllerKind": "StatefulSet",
"StatefulSet": {
"name": "moco-myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "moco-myapp",
"reconcileID": "487bdf78-2a05-489c-9473-578ea26ee907",
"newPartition": 0,
"oldPartition": 1
}
{
"level": "info",
"ts": "2025-02-20T19:41:43Z",
"msg": "partition is updated",
"controller": "statefulset",
"controllerGroup": "apps",
"controllerKind": "StatefulSet",
"StatefulSet": {
"name": "moco-myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "moco-myapp",
"reconcileID": "487bdf78-2a05-489c-9473-578ea26ee907"
}
{
"level": "info",
"ts": "2025-02-20T19:41:43Z",
"logger": "cluster-manager.myapp/myapp",
"msg": "cluster state is Healthy",
"operationId": "op-fn89n"
}
{
"level": "info",
"ts": "2025-02-20T19:41:43Z",
"logger": "cluster-manager.myapp/myapp",
"msg": "finish",
"operationId": "op-fn89n",
"duration": 0.113314412
}
I have no idea (yet) what could be causing this. An extremely wild hunch is that maybe https://github.com/cybozu-go/moco/pull/781#issuecomment-2667916155 is a fix for this, but I have currently no way to check that.
Every now and then the logs also contain this:
{
"level": "error",
"ts": "2025-02-20T19:41:43Z",
"msg": "failed to update status",
"controller": "mysqlcluster",
"controllerGroup": "moco.cybozu.com",
"controllerKind": "MySQLCluster",
"MySQLCluster": {
"name": "myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "myapp",
"reconcileID": "e6f09dd0-b294-4a46-b12a-81879497244a",
"error": "Operation cannot be fulfilled on mysqlclusters.moco.cybozu.com \"myapp\": the object has been modified; please apply your changes to the latest version and try again",
"stacktrace": "github.com/cybozu-go/moco/controllers.(*MySQLClusterReconciler).reconcileV1.func1\n\t/work/controllers/mysqlcluster_controller.go:260\ngithub.com/cybozu-go/moco/controllers.(*MySQLClusterReconciler).reconcileV1\n\t/work/controllers/mysqlcluster_controller.go:328\ngithub.com/cybozu-go/moco/controllers.(*MySQLClusterReconciler).Reconcile\n\t/work/controllers/mysqlcluster_controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224"
}
{
"level": "error",
"ts": "2025-02-20T19:41:43Z",
"msg": "Reconciler error",
"controller": "mysqlcluster",
"controllerGroup": "moco.cybozu.com",
"controllerKind": "MySQLCluster",
"MySQLCluster": {
"name": "myapp",
"namespace": "myapp"
},
"namespace": "myapp",
"name": "myapp",
"reconcileID": "e6f09dd0-b294-4a46-b12a-81879497244a",
"error": "Operation cannot be fulfilled on mysqlclusters.moco.cybozu.com \"myapp\": the object has been modified; please apply your changes to the latest version and try again",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224"
}
but then again, I'm pretty sure I didn't change the object (fully synced with ArgoCD).
Environments
- Version: 0.25.1
To Reproduce Steps to reproduce the behavior:
- Create a new cluster
- Wait for the cluster to finish setting up
- Look into
moco-controller's logs and see the above logs repeating
Expected behavior After initial reconciliation, it should happen again only when something has changed.
@rassie We've released moco v0.26.0 including #781, so please try again with this version.
Unfortunately, this still happens with 0.26.0.
@rassie
These logs seem to occur during the construction of MySQLCluster.
Do these logs stop appearing once the construction of MySQLCluster myapp is completed?
Or does the construction fail to complete?
I'm also having the same problem Getting alot of logs in controller container logs and PVC are not getting expanded (But statefulset is recreated correctly)
Getting
{"level":"error","ts":"2025-09-14T12:16:24Z","msg":"Reconciler error","controller":"mysqlcluster","controllerGroup":"moco.cybozu.com","controllerKind":"MySQLCluster","MySQLCluster":{"name":"moco-test","namespace":"default"},"namespace":"default","name":"moco-test","reconcileID":"1826bd2a-ba6f-4b06-875b-727f7d0d1c20","error":"Operation cannot be fulfilled on mysqlclusters.moco.cybozu.com \"moco-test\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:332\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:279\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:240"}
in controller logs continuously
I think due to this problem my volume resizes are not working...
@mhkarimi1383 I tried to reproduce this locally. I do see the same log:
{
"level": "error",
"ts": "2025-09-16T05:48:33.979Z",
"msg": "failed to update status",
"controller": "mysqlcluster",
"controllerGroup": "moco.cybozu.com",
"controllerKind": "MySQLCluster",
"MySQLCluster": {
"name": "test",
"namespace": "test"
},
"namespace": "test",
"name": "test",
"reconcileID": "69d10043-b841-4eea-8c9d-7de4b44f7675",
"error": "Operation cannot be fulfilled on mysqlclusters.moco.cybozu.com \"test\
the object has been modified; please apply your changes to the latest version and try again",
"stacktrace": "github.com/cybozu-go/moco/controllers.(*MySQLClusterReconciler).reconcileV1.func1\n\t/work/controllers/mysqlcluster_controller.go:262\ngithub.com/cybozu-go/moco/controllers.(*MySQLClusterReconciler).reconcileV1\n\t/work
/controllers/mysqlcluster_controller.go:330\ngithub.com/cybozu-go/moco/controllers.(*MySQLClusterReconciler).Reconcile\n\t/work/controllers/mysqlcluster_controller.go:210\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Control
ler[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/c
[email protected]/pkg/internal/controller/controller.go:319\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/contro
ller/controller.go:279\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:240"
}
But it only appears a few times and then stops; the PVC expansion finishes successfully.
I changed the storage size from 100Gi to 101Gi.:
$ kubectl get MySQLCluster -n test -ojson test | jq .spec.volumeClaimTemplates
[
{
"metadata": {
"name": "mysql-data"
},
"spec": {
"accessModes": [
"ReadWriteOnce"
],
"resources": {
"requests": {
"storage": "101Gi"
}
},
"storageClassName": "topolvm-flash1-xfs"
}
}
]
$ kubectl get pvc -n test
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
mysql-data-moco-test-0 Bound pvc-e3e90df9-4f5a-4a9e-8ae6-1a872f9347e0 101Gi RWO topolvm-flash1-xfs <unset> 103d
mysql-data-moco-test-1 Bound pvc-9df5a5ea-5b9c-4f85-bac6-4297b8117ed7 101Gi RWO topolvm-flash1-xfs <unset> 103d
mysql-data-moco-test-2 Bound pvc-fb57262e-2ac0-4d81-8777-2ebc5987cd84 101Gi RWO topolvm-flash1-xfs <unset> 103d
To dig in further, could you share a bit more about your environment and exact steps?
@shunki-fujita I have a two node minikube setup (with KVM2 driver) and OpenEBS Rawfile-localpv as storage engine
Installed moco using manifests
I will share my exact commands right now
@shunki-fujita
#!/usr/bin/env bash
set -exu -o pipefail
if ! minikube status -o json | jq -e 'length > 0 and all(.[]; .Host == "Running" and .Kubelet == "Running")'; then
minikube start -d kvm2 -n 2 --cpus='4' --memory='4GiB' --disk-size='50000mb'
fi
if [[ "$(kubectl config current-context)" != "minikube" ]]; then
kubectl config use-context minikube
fi
helm repo add rawfile-localpv https://openebs.github.io/rawfile-localpv || true # Add the repo if not already added
helm repo update rawfile-localpv
helm upgrade \
--install \
rawfile-localpv \
rawfile-localpv/rawfile-localpv \
--namespace kube-system \
--create-namespace \
--set node.dataDirPath=/mnt/vda1/rawfile
kubectl apply --server-side -f \
https://github.com/jetstack/cert-manager/releases/latest/download/cert-manager.yaml
kubectl apply --server-side -f \
https://github.com/cybozu-go/moco/releases/latest/download/moco.yaml
I have ran this script, then applied my manifest file
apiVersion: moco.cybozu.com/v1beta2
kind: MySQLCluster
metadata:
namespace: default
name: moco-test
spec:
serverIDBase: 831841464
replicas: 3
podTemplate:
spec:
containers:
- name: mysqld
image: ghcr.io/cybozu-go/moco/mysql:8.4.6
resources: {}
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
storageClassName: rawfile-localpv
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
Here is my manifest
Also I had to manually set serverIDBase when performing kubectl replace -f ...
It would be great if we could have randomly generated serverIDBase stored at k8s resource
Something to notice is that
At first it was 1Gi then I filled it up (100% usage) after that I have tried to change it to 5Gi
@shunki-fujita Am I doing something wrong?
@mhkarimi1383 I was debugging why increasing MySQLCluster.spec.volumeClaimTemplates[].resources.requests.storage (e.g., 5Gi → 10Gi) didn’t raise the PVC spec.resources.requests.storage. It turned out to be caused by putting an empty resources map on the mysqld container.
Because nil vs {} are treated as different, the controller keeps seeing a diff in the Pod template and reconciles the StatefulSet repeatedly, never reaching the PVC resize logic.
Please try removing resources: {} and the PVC resize should trigger.
Thanks, I will test is today
@shunki-fujita
Storage resize worked after removing resources key
But I have also tested that when I completely make my storage usage to be at 100%, Storage extend will not work and also master changes somehow
It will not extend volumes when my nodes are not healthy, So When this happens I have to manually extend that
and replace cluster manifest when cluster become healthy
I will ignore this if that's expected behavior
@mhkarimi1383 This is expected behavior. MOCO prioritizes StatefulSet reconciliation (resolving Pod template diffs / rolling updates). As a result, PVC reconciliation only runs once all Pods are Healthy/Ready. If nodes/Pods aren’t healthy (e.g., disk full), the controller won’t proceed with the PVC step.
@mhkarimi1383 This is expected behavior. MOCO prioritizes StatefulSet reconciliation (resolving Pod template diffs / rolling updates). As a result, PVC reconciliation only runs once all Pods are Healthy/Ready. If nodes/Pods aren’t healthy (e.g., disk full), the controller won’t proceed with the PVC step.
Nice I will continue testing operator :) Thanks