kubeblocks [BUG]PG cluster cannot connect for 10s after secondary node stop

[BUG]PG cluster cannot connect for 10s after secondary node stop

Open ahjing99 opened this issue 1 year ago • 0 comments

kbcli version Kubernetes: v1.25.8-gke.1000 KubeBlocks: 0.6.0-alpha.23 kbcli: v0.6.0-alpha.23

kbcli addon enable chaos-mesh
create cluster

kbcli cluster create pgcluster    --termination-policy=WipeOut    --cluster-definition=postgresql  --set cpu=1,memory=1Gi,storage=1Gi,replicas=2  --enable-all-logs=true

kbcli cluster describe pgcluster
Name: pgcluster	 Created Time: Jun 27,2023 10:05 UTC+0800
NAMESPACE   CLUSTER-DEFINITION   VERSION             STATUS    TERMINATION-POLICY
default     postgresql           postgresql-14.8.0   Running   WipeOut

Endpoints:
COMPONENT    MODE        INTERNAL                                              EXTERNAL
postgresql   ReadWrite   pgcluster-postgresql.default.svc.cluster.local:5432   <none>
                         pgcluster-postgresql.default.svc.cluster.local:6432

Topology:
COMPONENT    INSTANCE                 ROLE        STATUS    AZ              NODE                                                  CREATED-TIME
postgresql   pgcluster-postgresql-0   primary     Running   us-central1-c   gke-yjtest-default-pool-ce006ea7-chjh/10.128.15.206   Jun 27,2023 10:05 UTC+0800
postgresql   pgcluster-postgresql-1   secondary   Running   us-central1-c   gke-yjtest-default-pool-ce006ea7-7n4n/10.128.0.12     Jun 27,2023 10:17 UTC+0800

Resources Allocation:
COMPONENT    DEDICATED   CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE-SIZE   STORAGE-CLASS
postgresql   false       1 / 1                1Gi / 1Gi               data:1Gi       standard-rwo

Images:
COMPONENT    TYPE         IMAGE
postgresql   postgresql   registry.cn-hangzhou.aliyuncs.com/apecloud/spilo:14.8.0

Data Protection:
AUTO-BACKUP   BACKUP-SCHEDULE   TYPE     BACKUP-TTL   LAST-SCHEDULE   RECOVERABLE-TIME
Disabled      <none>            <none>   7d           <none>          <none>

Show cluster events: kbcli cluster list-events -n default pgcluster

Inject fault to secondary node

kbcli fault node stop gke-yjtest-default-pool-ce006ea7-7n4n -c=gcp --region=us-central1-c --project=apecloud-platform-engineering --duration=2m
Secret cloud-key-secret-gcp exists under default namespace.
GCPChaos node-chaos-fkbkx created

The cluster become unavailable twice for 10s and 9s, which is not expect

Connect cluster 2023-06-27 10:31:33
^@Fail to connect cluster 2023-06-27 10:32:07
runningToStopTime - 2023-06-27 10:32:07
85---85
Connect cluster 2023-06-27 10:32:17
runningToStopTime - 2023-06-27 10:32:07
stopToRunningTime - 2023-06-27 10:32:17
Time interval since MySQL started: 10 seconds
86---86
Fail to connect cluster 2023-06-27 10:32:56
runningToStopTime - 2023-06-27 10:32:56
^@86---86
Connect cluster 2023-06-27 10:33:05
runningToStopTime - 2023-06-27 10:32:56
stopToRunningTime - 2023-06-27 10:33:05
Time interval since MySQL started: 9 seconds

Jun 27 '23 02:06 ahjing99

kubeblocks kubeblocks copied to clipboard

[BUG]PG cluster cannot connect for 10s after secondary node stop

kubeblocks
kubeblocks copied to clipboard