kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[BUG] OpsDefinition with invalid JSON Schema shows Available status and blocks reconciliation

Open iziang opened this issue 1 month ago • 0 comments

Describe the bug

When an OpsDefinition has an incorrect YAML multiline string marker | after a property in parametersSchema.openAPIV3Schema.properties, it causes three issues:

  1. JSON Schema parsing error but status shows as normal: KubeBlocks operator logs show the error json: cannot unmarshal string into Go struct field JSONSchemaProps.items.spec.parametersSchema.openAPIV3Schema.properties of type v1.JSONSchemaProps, but the OpsDefinition's status incorrectly shows as Available instead of reflecting the actual error state.

  2. Unnecessary Pod dependency check: Even when the OpsDefinition doesn't declare podInfoExtractors and doesn't use exec type actions, creating an OpsRequest still fails with the error can not find any pod which matches the podSelector for the component fe, preventing operations from being executed when the cluster is stopped.

  3. Subsequent OpsDefinition reconciliation blocked: All OpsDefinitions applied after the problematic one stop reconciling, with their status remaining empty, affecting the entire cluster's operational capabilities.

To Reproduce

Steps to reproduce the behavior:

  1. Create a problematic OpsDefinition with | marker after a property in parametersSchema.openAPIV3Schema.properties:
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsDefinition
metadata:
  name: starrocks-snapshot-restore
spec:
  parametersSchema:
    openAPIV3Schema:
      properties:
        CLUSTER_SNAPSHOT_PATH: |
          description: CLUSTER_SNAPSHOT_PATH is the path of the snapshot to restore.
          type: string
        CLUSTER_NAME: |
          description: CLUSTER_NAME is the name of the cluster to restore.
          type: string
        CLUSTER_NAMESPACE:
          description: CLUSTER_NAMESPACE is the namespace of the cluster to restore.
          type: string
  actions:
    - name: snapshot-restore
      failurePolicy: Fail
      workload:
        type: Job
        backoffLimit: 3
        podSpec:
          restartPolicy: Never
          containers:
          - name: restore
            image: docker.io/apecloud/kubectl:1.29
            command:
            - bash
            - -c
            - |
              echo "restore script"
  1. Apply this OpsDefinition:
kubectl apply -f opsdefinition.yaml
  1. Check KubeBlocks operator logs, you will see JSON unmarshal errors:
W1112 03:04:29.920519       1 reflector.go:535] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1alpha1.OpsDefinition: json: cannot unmarshal string into Go struct field JSONSchemaProps.items.spec.parametersSchema.openAPIV3Schema.properties of type v1.JSONSchemaProps
  1. Check the OpsDefinition status, you will find it shows as Available (which is incorrect - it should reflect the parsing failure):
kubectl get opsdefinition starrocks-snapshot-restore
  1. After stopping the cluster, try to create an OpsRequest using this OpsDefinition:
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: sr1
spec:
  type: Custom
  clusterName: rice7-6bcff57968
  custom:
    components:
      - componentName: fe
        parameters: 
          - name: CLUSTER_SNAPSHOT_PATH
            value: "s3://starrocks-snapshot/starrocks-snapshot.tar.gz"
          - name: CLUSTER_NAME
            value: "rice7-6bcff57968"
          - name: CLUSTER_NAMESPACE
            value: "kubeblocks-cloud-ns"
    opsDefinitionName: starrocks-snapshot-restore
  1. The OpsRequest will fail immediately with error: can not find any pod which matches the podSelector for the component fe

  2. Try to apply other normal OpsDefinitions, you will find they stop reconciling with empty status.

Expected behavior

  1. Status should reflect errors: When an OpsDefinition's JSON Schema parsing fails, the status should show an error state (such as Failed or Invalid), not Available.

  2. Should not check for Pods: When an OpsDefinition doesn't declare podInfoExtractors and the action type is Job (which doesn't depend on Pods), creating an OpsRequest should not require running Pods in the cluster.

  3. Should not block other resources: A problematic OpsDefinition should not affect the normal reconciliation of other OpsDefinitions.

Screenshots

N/A

Desktop (please complete the following information):

  • OS: macOS 24.2.0
  • Kubernetes Version: v1.28.3-aliyun.1
  • KubeBlocks Version: 0.9.6-beta.1
  • kbcli Version: 1.0.0

Note: There is a version difference between kbcli (1.0.0) and kubeblocks (0.9.6-beta.1)

Additional context

  • The correct OpsDefinition format should be:
parametersSchema:
  openAPIV3Schema:
    properties:
      CLUSTER_SNAPSHOT_PATH:
        description: CLUSTER_SNAPSHOT_PATH is the path of the snapshot to restore.
        type: string
      CLUSTER_NAME:
        description: CLUSTER_NAME is the name of the cluster to restore.
        type: string
  • This issue affects operations that need to be executed without depending on running Pods (such as snapshot restore, data migration, etc.), which typically need to be executed when the cluster is stopped.

iziang avatar Nov 12 '25 03:11 iziang