[BUG] OpsDefinition with invalid JSON Schema shows Available status and blocks reconciliation
Describe the bug
When an OpsDefinition has an incorrect YAML multiline string marker | after a property in parametersSchema.openAPIV3Schema.properties, it causes three issues:
-
JSON Schema parsing error but status shows as normal: KubeBlocks operator logs show the error
json: cannot unmarshal string into Go struct field JSONSchemaProps.items.spec.parametersSchema.openAPIV3Schema.properties of type v1.JSONSchemaProps, but the OpsDefinition's status incorrectly shows asAvailableinstead of reflecting the actual error state. -
Unnecessary Pod dependency check: Even when the OpsDefinition doesn't declare
podInfoExtractorsand doesn't useexectype actions, creating an OpsRequest still fails with the errorcan not find any pod which matches the podSelector for the component fe, preventing operations from being executed when the cluster is stopped. -
Subsequent OpsDefinition reconciliation blocked: All OpsDefinitions applied after the problematic one stop reconciling, with their status remaining empty, affecting the entire cluster's operational capabilities.
To Reproduce
Steps to reproduce the behavior:
- Create a problematic OpsDefinition with
|marker after a property inparametersSchema.openAPIV3Schema.properties:
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsDefinition
metadata:
name: starrocks-snapshot-restore
spec:
parametersSchema:
openAPIV3Schema:
properties:
CLUSTER_SNAPSHOT_PATH: |
description: CLUSTER_SNAPSHOT_PATH is the path of the snapshot to restore.
type: string
CLUSTER_NAME: |
description: CLUSTER_NAME is the name of the cluster to restore.
type: string
CLUSTER_NAMESPACE:
description: CLUSTER_NAMESPACE is the namespace of the cluster to restore.
type: string
actions:
- name: snapshot-restore
failurePolicy: Fail
workload:
type: Job
backoffLimit: 3
podSpec:
restartPolicy: Never
containers:
- name: restore
image: docker.io/apecloud/kubectl:1.29
command:
- bash
- -c
- |
echo "restore script"
- Apply this OpsDefinition:
kubectl apply -f opsdefinition.yaml
- Check KubeBlocks operator logs, you will see JSON unmarshal errors:
W1112 03:04:29.920519 1 reflector.go:535] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1alpha1.OpsDefinition: json: cannot unmarshal string into Go struct field JSONSchemaProps.items.spec.parametersSchema.openAPIV3Schema.properties of type v1.JSONSchemaProps
- Check the OpsDefinition status, you will find it shows as
Available(which is incorrect - it should reflect the parsing failure):
kubectl get opsdefinition starrocks-snapshot-restore
- After stopping the cluster, try to create an OpsRequest using this OpsDefinition:
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: sr1
spec:
type: Custom
clusterName: rice7-6bcff57968
custom:
components:
- componentName: fe
parameters:
- name: CLUSTER_SNAPSHOT_PATH
value: "s3://starrocks-snapshot/starrocks-snapshot.tar.gz"
- name: CLUSTER_NAME
value: "rice7-6bcff57968"
- name: CLUSTER_NAMESPACE
value: "kubeblocks-cloud-ns"
opsDefinitionName: starrocks-snapshot-restore
-
The OpsRequest will fail immediately with error:
can not find any pod which matches the podSelector for the component fe -
Try to apply other normal OpsDefinitions, you will find they stop reconciling with empty status.
Expected behavior
-
Status should reflect errors: When an OpsDefinition's JSON Schema parsing fails, the status should show an error state (such as
FailedorInvalid), notAvailable. -
Should not check for Pods: When an OpsDefinition doesn't declare
podInfoExtractorsand the action type isJob(which doesn't depend on Pods), creating an OpsRequest should not require running Pods in the cluster. -
Should not block other resources: A problematic OpsDefinition should not affect the normal reconciliation of other OpsDefinitions.
Screenshots
N/A
Desktop (please complete the following information):
- OS: macOS 24.2.0
- Kubernetes Version: v1.28.3-aliyun.1
- KubeBlocks Version: 0.9.6-beta.1
- kbcli Version: 1.0.0
Note: There is a version difference between kbcli (1.0.0) and kubeblocks (0.9.6-beta.1)
Additional context
- The correct OpsDefinition format should be:
parametersSchema:
openAPIV3Schema:
properties:
CLUSTER_SNAPSHOT_PATH:
description: CLUSTER_SNAPSHOT_PATH is the path of the snapshot to restore.
type: string
CLUSTER_NAME:
description: CLUSTER_NAME is the name of the cluster to restore.
type: string
- This issue affects operations that need to be executed without depending on running Pods (such as snapshot restore, data migration, etc.), which typically need to be executed when the cluster is stopped.