kubeblocks [BUG]Data lost after primary pod of PG cluster' network is partition

[BUG]Data lost after primary pod of PG cluster' network is partition

Open ahjing99 opened this issue 1 year ago • 0 comments

➜ ~ kbcli version Kubernetes: v1.25.8-gke.1000 KubeBlocks: 0.6.0-alpha.21 kbcli: 0.6.0-alpha.21

enable chaos-mesh and create pg clustrer

kbcli addon enable chaos-mesh

kbcli cluster create pgcluster    --termination-policy=WipeOut    --cluster-definition=postgresql  --set cpu=1,memory=1Gi,storage=1Gi,replicas=2  --enable-all-logs=true

Before inject network partition fault

➜  ~ kbcli cluster describe pgcluster
Name: pgcluster	 Created Time: Jun 21,2023 13:35 UTC+0800
NAMESPACE   CLUSTER-DEFINITION   VERSION             STATUS    TERMINATION-POLICY
default     postgresql           postgresql-14.8.0   Running   WipeOut

Endpoints:
COMPONENT    MODE        INTERNAL                                              EXTERNAL
postgresql   ReadWrite   pgcluster-postgresql.default.svc.cluster.local:5432   <none>
                         pgcluster-postgresql.default.svc.cluster.local:6432

Topology:
COMPONENT    INSTANCE                 ROLE        STATUS    AZ              NODE                                                CREATED-TIME
postgresql   pgcluster-postgresql-0   secondary   Running   us-central1-c   gke-yjtest-default-pool-90acc18a-jsfb/10.128.0.58   Jun 21,2023 13:35 UTC+0800
postgresql   pgcluster-postgresql-1   primary     Running   us-central1-c   gke-yjtest-default-pool-90acc18a-pf8m/10.128.0.60   Jun 21,2023 13:35 UTC+0800

Resources Allocation:
COMPONENT    DEDICATED   CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE-SIZE   STORAGE-CLASS
postgresql   false       1 / 1                1Gi / 1Gi               data:1Gi       standard-rwo

Images:
COMPONENT    TYPE         IMAGE
postgresql   postgresql   registry.cn-hangzhou.aliyuncs.com/apecloud/spilo:14.8.0

Data Protection:
AUTO-BACKUP   BACKUP-SCHEDULE   TYPE     BACKUP-TTL   LAST-SCHEDULE   RECOVERABLE-TIME
Disabled      <none>            <none>   7d           <none>          <none>

Show cluster events: kbcli cluster list-events -n default pgcluster

Keep update the value every 5s


➜  ~ kbcli cluster connect pgcluster
Connect to instance pgcluster-postgresql-1: out of pgcluster-postgresql-1, pgcluster-postgresql-0
psql (14.8 (Ubuntu 14.8-1.pgdg22.04+1))
Type "help" for help.

postgres=# select * from tmp_table;
 id | value
----+-------
  1 |     5
(1 row)

postgres=# select * from tmp_table;
 id | value
----+-------
  1 |     6
(1 row)

postgres=# select * from tmp_table;
 id | value
----+-------
  1 |     7
(1 row)

postgres=# select * from tmp_table;
 id | value
----+-------
  1 |     8
(1 row)

postgres=# select * from tmp_table;
 id | value
----+-------
  1 |     9
(1 row)

postgres=# select * from tmp_table;
 id | value
----+-------
  1 |    10
(1 row)

Inject network partition to primary pod

➜  ~ kbcli fault network partition pgcluster-postgresql-1 --duration=3m
NetworkChaos network-chaos-46scj created

The last two update of 9 and 10 is lost

➜  ~ kbcli cluster connect pgcluster
Connect to instance pgcluster-postgresql-0: out of pgcluster-postgresql-0, pgcluster-postgresql-1
psql (14.8 (Ubuntu 14.8-1.pgdg22.04+1))
Type "help" for help.

postgres=#  select * from tmp_table;
 id | value
----+-------
  1 |     8
(1 row)

➜  ~ kbcli cluster describe pgcluster
Name: pgcluster	 Created Time: Jun 21,2023 13:35 UTC+0800
NAMESPACE   CLUSTER-DEFINITION   VERSION             STATUS    TERMINATION-POLICY
default     postgresql           postgresql-14.8.0   Running   WipeOut

Endpoints:
COMPONENT    MODE        INTERNAL                                              EXTERNAL
postgresql   ReadWrite   pgcluster-postgresql.default.svc.cluster.local:5432   <none>
                         pgcluster-postgresql.default.svc.cluster.local:6432

Topology:
COMPONENT    INSTANCE                 ROLE        STATUS    AZ              NODE                                                CREATED-TIME
postgresql   pgcluster-postgresql-0   primary     Running   us-central1-c   gke-yjtest-default-pool-90acc18a-jsfb/10.128.0.58   Jun 21,2023 13:35 UTC+0800
postgresql   pgcluster-postgresql-1   secondary   Running   us-central1-c   gke-yjtest-default-pool-90acc18a-pf8m/10.128.0.60   Jun 21,2023 13:35 UTC+0800

Resources Allocation:
COMPONENT    DEDICATED   CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE-SIZE   STORAGE-CLASS
postgresql   false       1 / 1                1Gi / 1Gi               data:1Gi       standard-rwo

Images:
COMPONENT    TYPE         IMAGE
postgresql   postgresql   registry.cn-hangzhou.aliyuncs.com/apecloud/spilo:14.8.0

Data Protection:
AUTO-BACKUP   BACKUP-SCHEDULE   TYPE     BACKUP-TTL   LAST-SCHEDULE   RECOVERABLE-TIME
Disabled      <none>            <none>   7d           <none>          <none>

Show cluster events: kbcli cluster list-events -n default pgcluster

Jun 21 '23 05:06 ahjing99

kubeblocks kubeblocks copied to clipboard

[BUG]Data lost after primary pod of PG cluster' network is partition

kubeblocks
kubeblocks copied to clipboard