kubeblocks
kubeblocks copied to clipboard
[BUG]PG cluster cannot connect for 10s after secondary node stop
kbcli version Kubernetes: v1.25.8-gke.1000 KubeBlocks: 0.6.0-alpha.23 kbcli: v0.6.0-alpha.23
- kbcli addon enable chaos-mesh
- create cluster
kbcli cluster create pgcluster --termination-policy=WipeOut --cluster-definition=postgresql --set cpu=1,memory=1Gi,storage=1Gi,replicas=2 --enable-all-logs=true
kbcli cluster describe pgcluster
Name: pgcluster Created Time: Jun 27,2023 10:05 UTC+0800
NAMESPACE CLUSTER-DEFINITION VERSION STATUS TERMINATION-POLICY
default postgresql postgresql-14.8.0 Running WipeOut
Endpoints:
COMPONENT MODE INTERNAL EXTERNAL
postgresql ReadWrite pgcluster-postgresql.default.svc.cluster.local:5432 <none>
pgcluster-postgresql.default.svc.cluster.local:6432
Topology:
COMPONENT INSTANCE ROLE STATUS AZ NODE CREATED-TIME
postgresql pgcluster-postgresql-0 primary Running us-central1-c gke-yjtest-default-pool-ce006ea7-chjh/10.128.15.206 Jun 27,2023 10:05 UTC+0800
postgresql pgcluster-postgresql-1 secondary Running us-central1-c gke-yjtest-default-pool-ce006ea7-7n4n/10.128.0.12 Jun 27,2023 10:17 UTC+0800
Resources Allocation:
COMPONENT DEDICATED CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE-SIZE STORAGE-CLASS
postgresql false 1 / 1 1Gi / 1Gi data:1Gi standard-rwo
Images:
COMPONENT TYPE IMAGE
postgresql postgresql registry.cn-hangzhou.aliyuncs.com/apecloud/spilo:14.8.0
Data Protection:
AUTO-BACKUP BACKUP-SCHEDULE TYPE BACKUP-TTL LAST-SCHEDULE RECOVERABLE-TIME
Disabled <none> <none> 7d <none> <none>
Show cluster events: kbcli cluster list-events -n default pgcluster
- Inject fault to secondary node
kbcli fault node stop gke-yjtest-default-pool-ce006ea7-7n4n -c=gcp --region=us-central1-c --project=apecloud-platform-engineering --duration=2m
Secret cloud-key-secret-gcp exists under default namespace.
GCPChaos node-chaos-fkbkx created
- The cluster become unavailable twice for 10s and 9s, which is not expect
Connect cluster 2023-06-27 10:31:33
^@Fail to connect cluster 2023-06-27 10:32:07
runningToStopTime - 2023-06-27 10:32:07
85---85
Connect cluster 2023-06-27 10:32:17
runningToStopTime - 2023-06-27 10:32:07
stopToRunningTime - 2023-06-27 10:32:17
Time interval since MySQL started: 10 seconds
86---86
Fail to connect cluster 2023-06-27 10:32:56
runningToStopTime - 2023-06-27 10:32:56
^@86---86
Connect cluster 2023-06-27 10:33:05
runningToStopTime - 2023-06-27 10:32:56
stopToRunningTime - 2023-06-27 10:33:05
Time interval since MySQL started: 9 seconds