kubeblocks
kubeblocks copied to clipboard
[BUG]kafka/starrocks restart ops failed
➜ ~ kbcli version Kubernetes: v1.29.4-gke.1043002 KubeBlocks: 0.9.0-beta.34 kbcli: 0.9.0-beta.27
During the restarting of kafka combined mode, the broker pod will crash for a few seconds before running, and restart ops failed before the cluster/pods turns to running, maybe ops can wait more time
➜ ~ kbcli cluster create kafka kafka-lydwaa --mode='combined' --cpu=0.5 --memory=0.5 --storage=1 --availability-policy=none --termination-policy=Delete --version=kafka-3.3.2 --storage-enable=true --meta-storage=1 --replicas=1
Cluster kafka-lydwaa created
➜ ~ kbcli cluster describe kafka-lydwaa
Name: kafka-lydwaa Created Time: Jun 20,2024 15:53 UTC+0800
NAMESPACE CLUSTER-DEFINITION VERSION STATUS TERMINATION-POLICY
default kafka kafka-3.3.2 Running Delete
Endpoints:
COMPONENT MODE INTERNAL EXTERNAL
broker ReadWrite kafka-lydwaa-broker-broker.default.svc.cluster.local:9092 <none>
Topology:
COMPONENT INSTANCE ROLE STATUS AZ NODE CREATED-TIME
broker kafka-lydwaa-broker-0 <none> Running us-central1-c gke-yjtest-default-pool-36251504-xw1f/10.128.15.202 Jun 20,2024 15:53 UTC+0800
metrics-exp kafka-lydwaa-metrics-exp-0 <none> Running us-central1-c gke-yjtest-default-pool-36251504-z5l7/10.128.15.197 Jun 20,2024 15:53 UTC+0800
Resources Allocation:
COMPONENT DEDICATED CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE-SIZE STORAGE-CLASS
broker false 500m / 500m 512Mi / 512Mi data:1Gi kb-default-sc
metadata:1Gi kb-default-sc
metrics-exp false 500m / 500m 512Mi / 512Mi <none> <none>
Images:
COMPONENT TYPE IMAGE
broker kafka-server docker.io/bitnami/kafka:3.3.2-debian-11-r54
metrics-exp kafka-exporter docker.io/bitnami/kafka-exporter:1.6.0-debian-11-r67
Show cluster events: kbcli cluster list-events -n default kafka-lydwaa
➜ ~ kbcli cluster restart kafka-lydwaa
Please type the name again(separate with white space when more than one): kafka-lydwaa
OpsRequest kafka-lydwaa-restart-dpmmd created successfully, you can view the progress:
kbcli cluster describe-ops kafka-lydwaa-restart-dpmmd -n default
➜ ~ k get pod
NAME READY STATUS RESTARTS AGE
kafka-lydwaa-broker-0 1/2 Running 0 16s
kafka-lydwaa-metrics-exp-0 0/1 CrashLoopBackOff 1 (12s ago) 18s
➜ ~ k logs kafka-lydwaa-metrics-exp-0 --previous
I0620 07:57:07.230134 1 kafka_exporter.go:792] Starting kafka_exporter (version=1.6.0, branch=non-git, revision=non-git)
F0620 07:57:07.991717 1 kafka_exporter.go:893] Error Init Kafka Client: kafka: client has run out of available brokers to talk to: dial tcp 10.124.1.31:9092: connect: connection refused
➜ ~ k describe ops kafka-lydwaa-restart-dpmmd
Name: kafka-lydwaa-restart-dpmmd
Namespace: default
Labels: app.kubernetes.io/instance=kafka-lydwaa
app.kubernetes.io/managed-by=kubeblocks
ops.kubeblocks.io/ops-type=Restart
Annotations: <none>
API Version: apps.kubeblocks.io/v1alpha1
Kind: OpsRequest
Metadata:
Creation Timestamp: 2024-06-20T07:56:46Z
Finalizers:
opsrequest.kubeblocks.io/finalizer
Generate Name: kafka-lydwaa-restart-
Generation: 2
Managed Fields:
API Version: apps.kubeblocks.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:generateName:
f:labels:
.:
f:app.kubernetes.io/instance:
f:app.kubernetes.io/managed-by:
f:spec:
.:
f:clusterName:
f:preConditionDeadlineSeconds:
f:restart:
.:
k:{"componentName":"broker"}:
.:
f:componentName:
k:{"componentName":"metrics-exp"}:
.:
f:componentName:
f:type:
Manager: kbcli
Operation: Update
Time: 2024-06-20T07:56:46Z
API Version: apps.kubeblocks.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"opsrequest.kubeblocks.io/finalizer":
f:labels:
f:ops.kubeblocks.io/ops-type:
f:ownerReferences:
.:
k:{"uid":"9e232459-fe5e-42f1-b98a-8f7d93ca4692"}:
Manager: manager
Operation: Update
Time: 2024-06-20T07:56:46Z
API Version: apps.kubeblocks.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:clusterGeneration:
f:completionTimestamp:
f:components:
.:
f:broker:
.:
f:phase:
f:progressDetails:
f:metrics-exp:
.:
f:phase:
f:progressDetails:
f:conditions:
.:
k:{"type":"Failed"}:
.:
f:lastTransitionTime:
f:message:
f:reason:
f:status:
f:type:
k:{"type":"Restarting"}:
.:
f:lastTransitionTime:
f:message:
f:reason:
f:status:
f:type:
k:{"type":"Validated"}:
.:
f:lastTransitionTime:
f:message:
f:reason:
f:status:
f:type:
k:{"type":"WaitForProgressing"}:
.:
f:lastTransitionTime:
f:message:
f:reason:
f:status:
f:type:
f:phase:
f:progress:
f:startTimestamp:
Manager: manager
Operation: Update
Subresource: status
Time: 2024-06-20T07:57:23Z
Owner References:
API Version: apps.kubeblocks.io/v1alpha1
Kind: Cluster
Name: kafka-lydwaa
UID: 9e232459-fe5e-42f1-b98a-8f7d93ca4692
Resource Version: 248776
UID: 1a90835d-5585-42ca-bc12-85b7aabef315
Spec:
Cluster Name: kafka-lydwaa
Pre Condition Deadline Seconds: 0
Restart:
Component Name: broker
Component Name: metrics-exp
Type: Restart
Status:
Cluster Generation: 3
Completion Timestamp: 2024-06-20T07:57:23Z
Components:
Broker:
Phase: Running
Progress Details:
End Time: 2024-06-20T07:57:22Z
Message: Successfully restart: Pod/kafka-lydwaa-broker-0 in Component: broker
Object Key: Pod/kafka-lydwaa-broker-0
Start Time: 2024-06-20T07:56:46Z
Status: Succeed
Metrics - Exp:
Phase: Failed
Progress Details:
End Time: 2024-06-20T07:56:51Z
Message: Failed to restart: Pod/kafka-lydwaa-metrics-exp-0 in Component: metrics-exp, message:
Object Key: Pod/kafka-lydwaa-metrics-exp-0
Start Time: 2024-06-20T07:56:46Z
Status: Failed
Conditions:
Last Transition Time: 2024-06-20T07:56:46Z
Message: wait for the controller to process the OpsRequest: kafka-lydwaa-restart-dpmmd in Cluster: kafka-lydwaa
Reason: WaitForProgressing
Status: True
Type: WaitForProgressing
Last Transition Time: 2024-06-20T07:56:46Z
Message: OpsRequest: kafka-lydwaa-restart-dpmmd is validated
Reason: ValidateOpsRequestPassed
Status: True
Type: Validated
Last Transition Time: 2024-06-20T07:56:46Z
Message: Start to restart database in Cluster: kafka-lydwaa
Reason: RestartStarted
Status: True
Type: Restarting
Last Transition Time: 2024-06-20T07:57:23Z
Message: Failed to process OpsRequest: kafka-lydwaa-restart-dpmmd in cluster: kafka-lydwaa, more detailed informations in status.components
Reason: OpsRequestFailed
Status: False
Type: Failed
Phase: Failed
Progress: 2/2
Start Timestamp: 2024-06-20T07:56:46Z
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForProgressing 5m23s (x2 over 5m23s) ops-request-controller wait for the controller to process the OpsRequest: kafka-lydwaa-restart-dpmmd in Cluster: kafka-lydwaa
Normal ValidateOpsRequestPassed 5m23s ops-request-controller OpsRequest: kafka-lydwaa-restart-dpmmd is validated
Normal RestartStarted 5m23s ops-request-controller Start to restart database in Cluster: kafka-lydwaa
Normal Processing 5m23s ops-request-controller Start to restart: Pod/kafka-lydwaa-broker-0 in Component: broker
Normal Succeed 5m18s ops-request-controller Successfully restart: Pod/kafka-lydwaa-metrics-exp-0 in Component: metrics-exp
Normal Processing 5m17s (x3 over 5m23s) ops-request-controller Start to restart: Pod/kafka-lydwaa-metrics-exp-0 in Component: metrics-exp
Warning Failed 5m1s ops-request-controller Failed to restart: Pod/kafka-lydwaa-metrics-exp-0 in Component: metrics-exp, message:
Normal Succeed 4m47s ops-request-controller Successfully restart: Pod/kafka-lydwaa-broker-0 in Component: broker
Warning OpsRequestFailed 4m46s (x2 over 4m46s) ops-request-controller Failed to process OpsRequest: kafka-lydwaa-restart-dpmmd in cluster: kafka-lydwaa, more detailed informations in status.components
The cluster will finally turns to running
➜ ~ kbcli cluster describe kafka-lydwaa
Name: kafka-lydwaa Created Time: Jun 20,2024 15:53 UTC+0800
NAMESPACE CLUSTER-DEFINITION VERSION STATUS TERMINATION-POLICY
default kafka kafka-3.3.2 Running Delete
Endpoints:
COMPONENT MODE INTERNAL EXTERNAL
broker ReadWrite kafka-lydwaa-broker-broker.default.svc.cluster.local:9092 <none>
Topology:
COMPONENT INSTANCE ROLE STATUS AZ NODE CREATED-TIME
broker kafka-lydwaa-broker-0 <none> Running us-central1-c gke-yjtest-default-pool-36251504-xw1f/10.128.15.202 Jun 20,2024 15:56 UTC+0800
metrics-exp kafka-lydwaa-metrics-exp-0 <none> Running us-central1-c gke-yjtest-default-pool-36251504-z5l7/10.128.15.197 Jun 20,2024 15:56 UTC+0800
Resources Allocation:
COMPONENT DEDICATED CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE-SIZE STORAGE-CLASS
broker false 500m / 500m 512Mi / 512Mi data:1Gi kb-default-sc
metadata:1Gi kb-default-sc
metrics-exp false 500m / 500m 512Mi / 512Mi <none> <none>
Images:
COMPONENT TYPE IMAGE
broker kafka-server docker.io/bitnami/kafka:3.3.2-debian-11-r54
metrics-exp kafka-exporter docker.io/bitnami/kafka-exporter:1.6.0-debian-11-r67
Show cluster events: kbcli cluster list-events -n default kafka-lydwaa