kubeblocks
kubeblocks copied to clipboard
[BUG] new configuration ops failed
Describe the bug
- use main branch test and install crds
- install clickhouse and clickhouse-cluster using new configuration
See error
{"level":"info","ts":"2025-03-04T11:53:50+08:00","logger":"ComponentParameterReconciler","msg":"failed to run configuration reconcile task.","controller":"componentparameter","controllerGroup":"parameters.kubeblocks.io","controllerKind":"ComponentParameter","ComponentParameter":{"name":"ch-cluster-ch-keeper","namespace":"default"},"namespace":"default","name":"ch-cluster-ch-keeper","reconcileID":"f8c05ff1-896a-4fc2-b3a5-1ae091c766b5","Namespace":"default","ComponentParameter":"ch-cluster-ch-keeper"}
{"level":"error","ts":"2025-03-04T11:53:50+08:00","msg":"Reconciler error","controller":"componentparameter","controllerGroup":"parameters.kubeblocks.io","controllerKind":"ComponentParameter","ComponentParameter":{"name":"ch-cluster-ch-keeper","namespace":"default"},"namespace":"default","name":"ch-cluster-ch-keeper","reconcileID":"f8c05ff1-896a-4fc2-b3a5-1ae091c766b5","error":"Object default/ch-cluster-ch-keeper-clickhouse-keeper-tpl is already owned by another Configuration controller ch-cluster-ch-keeper","errorCauses":[{"error":"Object default/ch-cluster-ch-keeper-clickhouse-keeper-tpl is already owned by another Configuration controller ch-cluster-ch-keeper"}],"stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":"2025-03-04T11:53:54+08:00","logger":"ComponentParameterReconciler","msg":"failed to run configuration reconcile task.","controller":"componentparameter","controllerGroup":"parameters.kubeblocks.io","controllerKind":"ComponentParameter","ComponentParameter":{"name":"ch-cluster-clickhouse","namespace":"default"},"namespace":"default","name":"ch-cluster-clickhouse","reconcileID":"b23185fe-c018-450b-b689-93a2b86e771d","Namespace":"default","ComponentParameter":"ch-cluster-clickhouse"}
{"level":"error","ts":"2025-03-04T11:53:54+08:00","msg":"Reconciler error","controller":"componentparameter","controllerGroup":"parameters.kubeblocks.io","controllerKind":"ComponentParameter","ComponentParameter":{"name":"ch-cluster-clickhouse","namespace":"default"},"namespace":"default","name":"ch-cluster-clickhouse","reconcileID":"b23185fe-c018-450b-b689-93a2b86e771d","error":"Object default/ch-cluster-clickhouse-clickhouse-tpl is already owned by another Configuration controller ch-cluster-clickhouse","errorCauses":[{"error":"Object default/ch-cluster-clickhouse-clickhouse-tpl is already owned by another Configuration controller ch-cluster-clickhouse"}],"stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}
under this situation, reconfiguration ops also get stuck
I tried to reproduce the bug, but it it failed. The steps are as follows:
step1: create ch cluster
helm upgrade --install ch2 addons-cluster/clickhouse -n test
step2: prepare ops cr
$ cat chops.yaml
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: ch-reconfiguring
namespace: test
spec:
# Specifies the name of the Cluster resource that this operation is targeting.
clusterName: ch2
# Instructs the system to bypass pre-checks (including cluster state checks and customized pre-conditions hooks) and immediately execute the opsRequest, except for the opsRequest of 'Start' type, which will still undergo pre-checks even if `force` is true. Note: Once set, the `force` field is immutable and cannot be updated.
force: false
# Specifies a component and its configuration updates. This field is deprecated and replaced by `reconfigures`.
reconfigures:
# Specifies the name of the Component.
- componentName: clickhouse
# Contains a list of ConfigurationItem objects, specifying the Component's configuration template name, upgrade policy, and parameter key-value pairs to be updated.
parameters:
# Represents the name of the parameter that is to be updated.
- key: clickhouse.profiles.web.max_partition_size_to_drop
# Represents the parameter values that are to be updated.
# If set to nil, the parameter defined by the Key field will be removed from the configuration file.
value: '0'
# Specifies the name of the configuration template.
# Specifies the maximum number of seconds the OpsRequest will wait for its start conditions to be met before aborting. If set to 0 (default), the start conditions must be met immediately for the OpsRequest to proceed.
preConditionDeadlineSeconds: 0
type: Reconfiguring
step3: check
$ k get ops -n test |grep ch2
ch-reconfiguring Reconfiguring ch2 Succeed -/- 22m
# zhangtao @ 192 in ~ [11:15:54]
$ k get parameters -n test|grep ch2
ch-reconfiguring ch2 Finished 22m
# zhangtao @ 192 in ~ [11:16:03]
$ k get componentparameters -n test|grep ch2
ch2-ch-keeper ch2 ch-keeper Finished 24m
ch2-clickhouse ch2 clickhouse Finished 24m
$ k get ops -n test ch-reconfiguring -o jsonpath='{.status}' |python3 -m json.tool
{
"clusterGeneration": 2,
"completionTimestamp": "2025-03-05T02:53:54Z",
"conditions": [
{
"lastTransitionTime": "2025-03-05T02:53:51Z",
"message": "wait for the controller to process the OpsRequest: ch-reconfiguring in Cluster: ch2",
"reason": "WaitForProgressing",
"status": "True",
"type": "WaitForProgressing"
},
{
"lastTransitionTime": "2025-03-05T02:53:52Z",
"message": "OpsRequest: ch-reconfiguring is validated",
"reason": "ValidateOpsRequestPassed",
"status": "True",
"type": "Validated"
},
{
"lastTransitionTime": "2025-03-05T02:53:52Z",
"message": "Start to reconfigure in Cluster: ch2, Component: clickhouse",
"reason": "ReconfigureStarted",
"status": "True",
"type": "Reconfigure"
},
{
"lastTransitionTime": "2025-03-05T02:53:54Z",
"message": "Successfully processed the OpsRequest: ch-reconfiguring in Cluster: ch2",
"reason": "OpsRequestProcessedSuccessfully",
"status": "True",
"type": "Succeed"
}
],
"phase": "Succeed",
"progress": "-/-",
"startTimestamp": "2025-03-05T02:53:52Z"
}
$ k get parameters -n test ch-reconfiguring -o jsonpath='{.status}' |python3 -m json.tool
{
"componentReconfiguringStatus": [
{
"componentName": "clickhouse",
"parameterStatus": [
{
"lastDoneRevision": "2",
"name": "clickhouse-user-tpl",
"phase": "Finished",
"reconcileDetail": {
"currentRevision": "3",
"execResult": "None",
"expectedCount": 2,
"policy": "restart",
"succeedCount": 2
},
"updateRevision": "2",
"updatedParameters": {
"user.xml": {
"parameters": {
"clickhouse.profiles.web.max_partition_size_to_drop": "0"
}
}
}
}
],
"phase": "Finished"
}
],
"observedGeneration": 1,
"phase": "Finished"
}
The test found two problems:
- The error log of the controller was not printed, so the cause of this error is unknown.
- The status of ops/parameters occasionally is inconsistent with the status of componentparameters.
This issue has been marked as stale because it has been open for 30 days with no activity
It looks work well using latest kb and ck, will close this issue.