kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[BUG]Data lost after primary pod of PG cluster' network is partition

Open ahjing99 opened this issue 1 year ago • 0 comments

➜ ~ kbcli version Kubernetes: v1.25.8-gke.1000 KubeBlocks: 0.6.0-alpha.21 kbcli: 0.6.0-alpha.21

  1. enable chaos-mesh and create pg clustrer
kbcli addon enable chaos-mesh

kbcli cluster create pgcluster    --termination-policy=WipeOut    --cluster-definition=postgresql  --set cpu=1,memory=1Gi,storage=1Gi,replicas=2  --enable-all-logs=true 
  1. Before inject network partition fault
➜  ~ kbcli cluster describe pgcluster
Name: pgcluster	 Created Time: Jun 21,2023 13:35 UTC+0800
NAMESPACE   CLUSTER-DEFINITION   VERSION             STATUS    TERMINATION-POLICY
default     postgresql           postgresql-14.8.0   Running   WipeOut

Endpoints:
COMPONENT    MODE        INTERNAL                                              EXTERNAL
postgresql   ReadWrite   pgcluster-postgresql.default.svc.cluster.local:5432   <none>
                         pgcluster-postgresql.default.svc.cluster.local:6432

Topology:
COMPONENT    INSTANCE                 ROLE        STATUS    AZ              NODE                                                CREATED-TIME
postgresql   pgcluster-postgresql-0   secondary   Running   us-central1-c   gke-yjtest-default-pool-90acc18a-jsfb/10.128.0.58   Jun 21,2023 13:35 UTC+0800
postgresql   pgcluster-postgresql-1   primary     Running   us-central1-c   gke-yjtest-default-pool-90acc18a-pf8m/10.128.0.60   Jun 21,2023 13:35 UTC+0800

Resources Allocation:
COMPONENT    DEDICATED   CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE-SIZE   STORAGE-CLASS
postgresql   false       1 / 1                1Gi / 1Gi               data:1Gi       standard-rwo

Images:
COMPONENT    TYPE         IMAGE
postgresql   postgresql   registry.cn-hangzhou.aliyuncs.com/apecloud/spilo:14.8.0

Data Protection:
AUTO-BACKUP   BACKUP-SCHEDULE   TYPE     BACKUP-TTL   LAST-SCHEDULE   RECOVERABLE-TIME
Disabled      <none>            <none>   7d           <none>          <none>

Show cluster events: kbcli cluster list-events -n default pgcluster

  1. Keep update the value every 5s

➜  ~ kbcli cluster connect pgcluster
Connect to instance pgcluster-postgresql-1: out of pgcluster-postgresql-1, pgcluster-postgresql-0
psql (14.8 (Ubuntu 14.8-1.pgdg22.04+1))
Type "help" for help.

postgres=# select * from tmp_table;
 id | value
----+-------
  1 |     5
(1 row)

postgres=# select * from tmp_table;
 id | value
----+-------
  1 |     6
(1 row)

postgres=# select * from tmp_table;
 id | value
----+-------
  1 |     7
(1 row)

postgres=# select * from tmp_table;
 id | value
----+-------
  1 |     8
(1 row)

postgres=# select * from tmp_table;
 id | value
----+-------
  1 |     9
(1 row)

postgres=# select * from tmp_table;
 id | value
----+-------
  1 |    10
(1 row)
  1. Inject network partition to primary pod
➜  ~ kbcli fault network partition pgcluster-postgresql-1 --duration=3m
NetworkChaos network-chaos-46scj created
  1. The last two update of 9 and 10 is lost
➜  ~ kbcli cluster connect pgcluster
Connect to instance pgcluster-postgresql-0: out of pgcluster-postgresql-0, pgcluster-postgresql-1
psql (14.8 (Ubuntu 14.8-1.pgdg22.04+1))
Type "help" for help.

postgres=#  select * from tmp_table;
 id | value
----+-------
  1 |     8
(1 row)

➜  ~ kbcli cluster describe pgcluster
Name: pgcluster	 Created Time: Jun 21,2023 13:35 UTC+0800
NAMESPACE   CLUSTER-DEFINITION   VERSION             STATUS    TERMINATION-POLICY
default     postgresql           postgresql-14.8.0   Running   WipeOut

Endpoints:
COMPONENT    MODE        INTERNAL                                              EXTERNAL
postgresql   ReadWrite   pgcluster-postgresql.default.svc.cluster.local:5432   <none>
                         pgcluster-postgresql.default.svc.cluster.local:6432

Topology:
COMPONENT    INSTANCE                 ROLE        STATUS    AZ              NODE                                                CREATED-TIME
postgresql   pgcluster-postgresql-0   primary     Running   us-central1-c   gke-yjtest-default-pool-90acc18a-jsfb/10.128.0.58   Jun 21,2023 13:35 UTC+0800
postgresql   pgcluster-postgresql-1   secondary   Running   us-central1-c   gke-yjtest-default-pool-90acc18a-pf8m/10.128.0.60   Jun 21,2023 13:35 UTC+0800

Resources Allocation:
COMPONENT    DEDICATED   CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE-SIZE   STORAGE-CLASS
postgresql   false       1 / 1                1Gi / 1Gi               data:1Gi       standard-rwo

Images:
COMPONENT    TYPE         IMAGE
postgresql   postgresql   registry.cn-hangzhou.aliyuncs.com/apecloud/spilo:14.8.0

Data Protection:
AUTO-BACKUP   BACKUP-SCHEDULE   TYPE     BACKUP-TTL   LAST-SCHEDULE   RECOVERABLE-TIME
Disabled      <none>            <none>   7d           <none>          <none>

Show cluster events: kbcli cluster list-events -n default pgcluster

ahjing99 avatar Jun 21 '23 05:06 ahjing99