kubeblocks
kubeblocks copied to clipboard
[BUG]Data lost after primary pod of PG cluster' network is partition
➜ ~ kbcli version Kubernetes: v1.25.8-gke.1000 KubeBlocks: 0.6.0-alpha.21 kbcli: 0.6.0-alpha.21
- enable chaos-mesh and create pg clustrer
kbcli addon enable chaos-mesh
kbcli cluster create pgcluster --termination-policy=WipeOut --cluster-definition=postgresql --set cpu=1,memory=1Gi,storage=1Gi,replicas=2 --enable-all-logs=true
- Before inject network partition fault
➜ ~ kbcli cluster describe pgcluster
Name: pgcluster Created Time: Jun 21,2023 13:35 UTC+0800
NAMESPACE CLUSTER-DEFINITION VERSION STATUS TERMINATION-POLICY
default postgresql postgresql-14.8.0 Running WipeOut
Endpoints:
COMPONENT MODE INTERNAL EXTERNAL
postgresql ReadWrite pgcluster-postgresql.default.svc.cluster.local:5432 <none>
pgcluster-postgresql.default.svc.cluster.local:6432
Topology:
COMPONENT INSTANCE ROLE STATUS AZ NODE CREATED-TIME
postgresql pgcluster-postgresql-0 secondary Running us-central1-c gke-yjtest-default-pool-90acc18a-jsfb/10.128.0.58 Jun 21,2023 13:35 UTC+0800
postgresql pgcluster-postgresql-1 primary Running us-central1-c gke-yjtest-default-pool-90acc18a-pf8m/10.128.0.60 Jun 21,2023 13:35 UTC+0800
Resources Allocation:
COMPONENT DEDICATED CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE-SIZE STORAGE-CLASS
postgresql false 1 / 1 1Gi / 1Gi data:1Gi standard-rwo
Images:
COMPONENT TYPE IMAGE
postgresql postgresql registry.cn-hangzhou.aliyuncs.com/apecloud/spilo:14.8.0
Data Protection:
AUTO-BACKUP BACKUP-SCHEDULE TYPE BACKUP-TTL LAST-SCHEDULE RECOVERABLE-TIME
Disabled <none> <none> 7d <none> <none>
Show cluster events: kbcli cluster list-events -n default pgcluster
- Keep update the value every 5s
➜ ~ kbcli cluster connect pgcluster
Connect to instance pgcluster-postgresql-1: out of pgcluster-postgresql-1, pgcluster-postgresql-0
psql (14.8 (Ubuntu 14.8-1.pgdg22.04+1))
Type "help" for help.
postgres=# select * from tmp_table;
id | value
----+-------
1 | 5
(1 row)
postgres=# select * from tmp_table;
id | value
----+-------
1 | 6
(1 row)
postgres=# select * from tmp_table;
id | value
----+-------
1 | 7
(1 row)
postgres=# select * from tmp_table;
id | value
----+-------
1 | 8
(1 row)
postgres=# select * from tmp_table;
id | value
----+-------
1 | 9
(1 row)
postgres=# select * from tmp_table;
id | value
----+-------
1 | 10
(1 row)
- Inject network partition to primary pod
➜ ~ kbcli fault network partition pgcluster-postgresql-1 --duration=3m
NetworkChaos network-chaos-46scj created
- The last two update of 9 and 10 is lost
➜ ~ kbcli cluster connect pgcluster
Connect to instance pgcluster-postgresql-0: out of pgcluster-postgresql-0, pgcluster-postgresql-1
psql (14.8 (Ubuntu 14.8-1.pgdg22.04+1))
Type "help" for help.
postgres=# select * from tmp_table;
id | value
----+-------
1 | 8
(1 row)
➜ ~ kbcli cluster describe pgcluster
Name: pgcluster Created Time: Jun 21,2023 13:35 UTC+0800
NAMESPACE CLUSTER-DEFINITION VERSION STATUS TERMINATION-POLICY
default postgresql postgresql-14.8.0 Running WipeOut
Endpoints:
COMPONENT MODE INTERNAL EXTERNAL
postgresql ReadWrite pgcluster-postgresql.default.svc.cluster.local:5432 <none>
pgcluster-postgresql.default.svc.cluster.local:6432
Topology:
COMPONENT INSTANCE ROLE STATUS AZ NODE CREATED-TIME
postgresql pgcluster-postgresql-0 primary Running us-central1-c gke-yjtest-default-pool-90acc18a-jsfb/10.128.0.58 Jun 21,2023 13:35 UTC+0800
postgresql pgcluster-postgresql-1 secondary Running us-central1-c gke-yjtest-default-pool-90acc18a-pf8m/10.128.0.60 Jun 21,2023 13:35 UTC+0800
Resources Allocation:
COMPONENT DEDICATED CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE-SIZE STORAGE-CLASS
postgresql false 1 / 1 1Gi / 1Gi data:1Gi standard-rwo
Images:
COMPONENT TYPE IMAGE
postgresql postgresql registry.cn-hangzhou.aliyuncs.com/apecloud/spilo:14.8.0
Data Protection:
AUTO-BACKUP BACKUP-SCHEDULE TYPE BACKUP-TTL LAST-SCHEDULE RECOVERABLE-TIME
Disabled <none> <none> 7d <none> <none>
Show cluster events: kbcli cluster list-events -n default pgcluster