kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[BUG] new configuration ops failed

Open loomts opened this issue 9 months ago • 3 comments

Describe the bug

  1. use main branch test and install crds
  2. install clickhouse and clickhouse-cluster using new configuration

See error

{"level":"info","ts":"2025-03-04T11:53:50+08:00","logger":"ComponentParameterReconciler","msg":"failed to run configuration reconcile task.","controller":"componentparameter","controllerGroup":"parameters.kubeblocks.io","controllerKind":"ComponentParameter","ComponentParameter":{"name":"ch-cluster-ch-keeper","namespace":"default"},"namespace":"default","name":"ch-cluster-ch-keeper","reconcileID":"f8c05ff1-896a-4fc2-b3a5-1ae091c766b5","Namespace":"default","ComponentParameter":"ch-cluster-ch-keeper"}
{"level":"error","ts":"2025-03-04T11:53:50+08:00","msg":"Reconciler error","controller":"componentparameter","controllerGroup":"parameters.kubeblocks.io","controllerKind":"ComponentParameter","ComponentParameter":{"name":"ch-cluster-ch-keeper","namespace":"default"},"namespace":"default","name":"ch-cluster-ch-keeper","reconcileID":"f8c05ff1-896a-4fc2-b3a5-1ae091c766b5","error":"Object default/ch-cluster-ch-keeper-clickhouse-keeper-tpl is already owned by another Configuration controller ch-cluster-ch-keeper","errorCauses":[{"error":"Object default/ch-cluster-ch-keeper-clickhouse-keeper-tpl is already owned by another Configuration controller ch-cluster-ch-keeper"}],"stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":"2025-03-04T11:53:54+08:00","logger":"ComponentParameterReconciler","msg":"failed to run configuration reconcile task.","controller":"componentparameter","controllerGroup":"parameters.kubeblocks.io","controllerKind":"ComponentParameter","ComponentParameter":{"name":"ch-cluster-clickhouse","namespace":"default"},"namespace":"default","name":"ch-cluster-clickhouse","reconcileID":"b23185fe-c018-450b-b689-93a2b86e771d","Namespace":"default","ComponentParameter":"ch-cluster-clickhouse"}
{"level":"error","ts":"2025-03-04T11:53:54+08:00","msg":"Reconciler error","controller":"componentparameter","controllerGroup":"parameters.kubeblocks.io","controllerKind":"ComponentParameter","ComponentParameter":{"name":"ch-cluster-clickhouse","namespace":"default"},"namespace":"default","name":"ch-cluster-clickhouse","reconcileID":"b23185fe-c018-450b-b689-93a2b86e771d","error":"Object default/ch-cluster-clickhouse-clickhouse-tpl is already owned by another Configuration controller ch-cluster-clickhouse","errorCauses":[{"error":"Object default/ch-cluster-clickhouse-clickhouse-tpl is already owned by another Configuration controller ch-cluster-clickhouse"}],"stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

under this situation, reconfiguration ops also get stuck

loomts avatar Mar 04 '25 04:03 loomts

I tried to reproduce the bug, but it it failed. The steps are as follows:

step1: create ch cluster

helm upgrade --install ch2 addons-cluster/clickhouse -n test

step2: prepare ops cr

$ cat chops.yaml 
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: ch-reconfiguring
  namespace: test
spec:
  # Specifies the name of the Cluster resource that this operation is targeting.
  clusterName: ch2
  # Instructs the system to bypass pre-checks (including cluster state checks and customized pre-conditions hooks) and immediately execute the opsRequest, except for the opsRequest of 'Start' type, which will still undergo pre-checks even if `force` is true.  Note: Once set, the `force` field is immutable and cannot be updated.
  force: false
  # Specifies a component and its configuration updates. This field is deprecated and replaced by `reconfigures`.
  reconfigures:
    # Specifies the name of the Component.
  - componentName: clickhouse
   # Contains a list of ConfigurationItem objects, specifying the Component's configuration template name, upgrade policy, and parameter key-value pairs to be updated.
    parameters:
      # Represents the name of the parameter that is to be updated.
    - key: clickhouse.profiles.web.max_partition_size_to_drop
      # Represents the parameter values that are to be updated.
      # If set to nil, the parameter defined by the Key field will be removed from the configuration file.
      value: '0'
  # Specifies the name of the configuration template.
  # Specifies the maximum number of seconds the OpsRequest will wait for its start conditions to be met before aborting. If set to 0 (default), the start conditions must be met immediately for the OpsRequest to proceed.
  preConditionDeadlineSeconds: 0
  type: Reconfiguring

step3: check

$ k get ops -n test |grep ch2
ch-reconfiguring          Reconfiguring   ch2       Succeed   -/-        22m

# zhangtao @ 192 in ~ [11:15:54] 
$ k get parameters -n test|grep ch2
ch-reconfiguring               ch2       Finished      22m

# zhangtao @ 192 in ~ [11:16:03] 
$ k get componentparameters -n test|grep ch2 
ch2-ch-keeper            ch2         ch-keeper          Finished   24m
ch2-clickhouse           ch2         clickhouse         Finished   24m

$ k get ops -n test ch-reconfiguring -o jsonpath='{.status}' |python3 -m json.tool 
{
    "clusterGeneration": 2,
    "completionTimestamp": "2025-03-05T02:53:54Z",
    "conditions": [
        {
            "lastTransitionTime": "2025-03-05T02:53:51Z",
            "message": "wait for the controller to process the OpsRequest: ch-reconfiguring in Cluster: ch2",
            "reason": "WaitForProgressing",
            "status": "True",
            "type": "WaitForProgressing"
        },
        {
            "lastTransitionTime": "2025-03-05T02:53:52Z",
            "message": "OpsRequest: ch-reconfiguring is validated",
            "reason": "ValidateOpsRequestPassed",
            "status": "True",
            "type": "Validated"
        },
        {
            "lastTransitionTime": "2025-03-05T02:53:52Z",
            "message": "Start to reconfigure in Cluster: ch2, Component: clickhouse",
            "reason": "ReconfigureStarted",
            "status": "True",
            "type": "Reconfigure"
        },
        {
            "lastTransitionTime": "2025-03-05T02:53:54Z",
            "message": "Successfully processed the OpsRequest: ch-reconfiguring in Cluster: ch2",
            "reason": "OpsRequestProcessedSuccessfully",
            "status": "True",
            "type": "Succeed"
        }
    ],
    "phase": "Succeed",
    "progress": "-/-",
    "startTimestamp": "2025-03-05T02:53:52Z"
}


$ k get parameters -n test ch-reconfiguring -o jsonpath='{.status}' |python3 -m json.tool
{
    "componentReconfiguringStatus": [
        {
            "componentName": "clickhouse",
            "parameterStatus": [
                {
                    "lastDoneRevision": "2",
                    "name": "clickhouse-user-tpl",
                    "phase": "Finished",
                    "reconcileDetail": {
                        "currentRevision": "3",
                        "execResult": "None",
                        "expectedCount": 2,
                        "policy": "restart",
                        "succeedCount": 2
                    },
                    "updateRevision": "2",
                    "updatedParameters": {
                        "user.xml": {
                            "parameters": {
                                "clickhouse.profiles.web.max_partition_size_to_drop": "0"
                            }
                        }
                    }
                }
            ],
            "phase": "Finished"
        }
    ],
    "observedGeneration": 1,
    "phase": "Finished"
}

sophon-zt avatar Mar 05 '25 03:03 sophon-zt

The test found two problems:

  1. The error log of the controller was not printed, so the cause of this error is unknown.
  2. The status of ops/parameters occasionally is inconsistent with the status of componentparameters.

sophon-zt avatar Mar 05 '25 03:03 sophon-zt

This issue has been marked as stale because it has been open for 30 days with no activity

github-actions[bot] avatar Apr 07 '25 00:04 github-actions[bot]

It looks work well using latest kb and ck, will close this issue.

loomts avatar Apr 30 '25 01:04 loomts