kubeblocks
kubeblocks copied to clipboard
[BUG] postgresql cluster upgrade to v1 scale-out ERROR: failed to bootstrap from leader
Describe the bug A clear and concise description of what the bug is.
kbcli version
Kubernetes: v1.30.4-vke.5
KubeBlocks: 1.0.0-beta.47,0.9.4-beta.20
kbcli: 1.0.0-beta.21
1. ERROR: failed to bootstrap from leader 'postgres-cluster-postgresql-0' 2. Primary and secondary data are out of sync.
To Reproduce Steps to reproduce the behavior:
- upgrade to v1
echo yes|kbcli cluster upgrade-to-v1 postgres-cluster
┌──────────────────────────────────────────────────────────────┐ ┌─────────────────────────────────────────────────────────────┐
│apiVersion: apps.kubeblocks.io/v1alpha1 │ │apiVersion: apps.kubeblocks.io/v1 │
│kind: Cluster │ │kind: Cluster │
│metadata: │ │metadata: │
│ annotations: │ │ annotations: │
│ kubeblocks.io/crd-api-version: apps.kubeblocks.io/v1alpha1│ │ kubeblocks.io/crd-api-version: apps.kubeblocks.io/v1 │
│ kubeblocks.io/reconcile: "2025-04-28T07:40:59.345764723Z" │ │ kubeblocks.io/reconcile: "2025-04-28T07:40:59.345764723Z"│
│ creationTimestamp: "2025-04-28T07:23:40Z" │ │ creationTimestamp: "2025-04-28T07:23:40Z" │
│ finalizers: │ │ finalizers: │
│ - cluster.kubeblocks.io/finalizer │ │ - cluster.kubeblocks.io/finalizer │
│ generation: 2 │ │ generation: 2 │
│ name: postgres-cluster │ │ name: postgres-cluster │
│ namespace: default │ │ namespace: default │
│ resourceVersion: "36299" │ │ resourceVersion: "36299" │
│ uid: 714dbbce-0c2e-469e-88be-380ff0218720 │ │ uid: 714dbbce-0c2e-469e-88be-380ff0218720 │
│spec: │ │spec: │
│ componentSpecs: │ │ componentSpecs: │
│ - componentDef: postgresql-16 │ │ - componentDef: postgresql-16-1.0.0-alpha.0 │
│ name: postgresql │ │ name: postgresql │
│ replicas: 2 │ │ replicas: 2 │
│ resources: │ │ resources: │
│ limits: │ │ limits: │
│ cpu: 100m │ │ cpu: 100m │
│ memory: 512Mi │ │ memory: 512Mi │
│ requests: │ │ requests: │
│ cpu: 100m │ │ cpu: 100m │
│ memory: 512Mi │ │ memory: 512Mi │
│ serviceVersion: 16.4.0 │ │ serviceVersion: 16.4.0 │
│ switchPolicy: │ │ volumeClaimTemplates: │
│ type: Noop │ │ - name: data │
│ updateStrategy: BestEffortParallel │ │ spec: │
│ volumeClaimTemplates: │ │ accessModes: │
│ - name: data │ │ - ReadWriteOnce │
│ spec: │ │ resources: │
│ accessModes: │ │ requests: │
│ - ReadWriteOnce │ │ storage: 20Gi │
│ resources: │ │ terminationPolicy: WipeOut │
│ requests: │ │status: {} │
│ storage: 20Gi │ │ │
│ resources: │ └─────────────────────────────────────────────────────────────┘
│ cpu: "0" │
│ memory: "0" │
│ storage: │
│ size: "0" │
│ terminationPolicy: WipeOut │
│status: {} │
│ │
└──────────────────────────────────────────────────────────────┘
Cluster postgres-cluster will be converted to v1 with output as yaml.
Please type 'Yes/yes' to confirm your operation: yes
postgres-cluster-postgresql-postgresql
Cluster postgres-cluster has converted successfully, you can view the spec:
kubectl get clusters.apps.kubeblocks.io postgres-cluster -n default -oyaml
kubectl get clusters.apps.kubeblocks.io postgres-cluster -n default -oyaml
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
annotations:
kubeblocks.io/crd-api-version: apps.kubeblocks.io/v1
kubeblocks.io/reconcile: "2025-04-28T07:40:59.345764723Z"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apps.kubeblocks.io/v1alpha1","kind":"Cluster","metadata":{"annotations":{},"name":"postgres-cluster","namespace":"default"},"spec":{"componentSpecs":[{"componentDef":"postgresql-16","name":"postgresql","replicas":2,"resources":{"limits":{"cpu":"100m","memory":"0.5Gi"},"requests":{"cpu":"100m","memory":"0.5Gi"}},"switchPolicy":{"type":"Noop"},"updateStrategy":"BestEffortParallel","volumeClaimTemplates":[{"name":"data","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"20Gi"}},"storageClassName":null}}]}],"terminationPolicy":"WipeOut"}}
creationTimestamp: "2025-04-28T07:23:40Z"
finalizers:
- cluster.kubeblocks.io/finalizer
generation: 3
name: postgres-cluster
namespace: default
resourceVersion: "59059"
uid: 714dbbce-0c2e-469e-88be-380ff0218720
spec:
componentSpecs:
- componentDef: postgresql-16-1.0.0-alpha.0
name: postgresql
replicas: 2
resources:
limits:
cpu: 100m
memory: 512Mi
requests:
cpu: 100m
memory: 512Mi
serviceVersion: 16.4.0
volumeClaimTemplates:
- name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
terminationPolicy: WipeOut
status:
components:
postgresql:
phase: Updating
conditions:
- lastTransitionTime: "2025-04-28T08:07:35Z"
message: 'The operator has started the provisioning of Cluster: postgres-cluster'
observedGeneration: 3
reason: PreCheckSucceed
status: "True"
type: ProvisioningStarted
- lastTransitionTime: "2025-04-28T07:23:40Z"
message: Successfully applied for resources
observedGeneration: 3
reason: ApplyResourcesSucceed
status: "True"
type: ApplyResources
- lastTransitionTime: "2025-04-28T07:35:02Z"
message: all pods of components are ready, waiting for the probe detection successful
reason: AllReplicasReady
status: "True"
type: ReplicasReady
- lastTransitionTime: "2025-04-28T07:35:02Z"
message: 'Cluster: postgres-cluster is ready, current phase is Running'
reason: ClusterReady
status: "True"
type: Ready
observedGeneration: 3
phase: Updating
- scale-out
kbcli cluster scale-out postgres-cluster --auto-approve --force=true --components postgresql --replicas 1 --namespace default
- See error
➜ ~ kubectl get cluster postgres-cluster
NAME CLUSTER-DEFINITION TERMINATION-POLICY STATUS AGE
postgres-cluster WipeOut Updating 142m
➜ ~
➜ ~ kubectl get pod -l app.kubernetes.io/instance=postgres-cluster
NAME READY STATUS RESTARTS AGE
postgres-cluster-postgresql-0 5/5 Running 0 83m
postgres-cluster-postgresql-1 5/5 Running 0 80m
postgres-cluster-postgresql-2 4/5 Running 0 17m
`kbcli cluster list-instances postgres-cluster --namespace default `
NAME NAMESPACE CLUSTER COMPONENT STATUS ROLE ACCESSMODE AZ CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE NODE CREATED-TIME
postgres-cluster-postgresql-0 default postgres-cluster postgresql Running primary cn-beijing-b 100m / 100m 512Mi / 512Mi data:20Gi 172.31.0.21/172.31.0.21 Apr 28,2025 16:23 UTC+0800
postgres-cluster-postgresql-1 default postgres-cluster postgresql Running secondary cn-beijing-b 100m / 100m 512Mi / 512Mi data:20Gi 172.31.0.43/172.31.0.43 Apr 28,2025 16:25 UTC+0800
postgres-cluster-postgresql-2 default postgres-cluster postgresql Running <none> cn-beijing-b 100m / 100m 512Mi / 512Mi data:20Gi 172.31.0.7/172.31.0.7 Apr 28,2025 17:28 UTC+0800
logs error pod
➜ ~ kubectl logs postgres-cluster-postgresql-2
Defaulted container "postgresql" out of: postgresql, pgbouncer, exporter, kbagent, config-manager, pg-init-container (init), init-dbctl (init), init-kbagent (init), kbagent-worker (init)
2025-04-28 09:29:37,648 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2025-04-28 09:29:39,651 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2025-04-28 09:29:39,652 - bootstrapping - INFO - No meta-data available for this provider
2025-04-28 09:29:39,652 - bootstrapping - INFO - Looks like you are running local
2025-04-28 09:29:39,746 - bootstrapping - INFO - kubeblocks generate local configuration:
bootstrap:
dcs:
check_timeline: true
loop_wait: 10
max_timelines_history: 0
maximum_lag_on_failover: 1048576
postgresql:
parameters:
archive_command: /bin/true
archive_mode: 'on'
autovacuum_analyze_scale_factor: '0.1'
autovacuum_max_workers: '3'
autovacuum_vacuum_scale_factor: '0.05'
checkpoint_completion_target: '0.9'
log_autovacuum_min_duration: '10000'
log_checkpoints: 'True'
log_connections: 'False'
log_disconnections: 'False'
log_min_duration_statement: '1000'
log_statement: ddl
log_temp_files: 128kB
max_connections: '56'
max_locks_per_transaction: '64'
max_prepared_transactions: '100'
max_replication_slots: '16'
max_wal_senders: '64'
max_worker_processes: '8'
tcp_keepalives_idle: 45s
tcp_keepalives_interval: 10s
track_commit_timestamp: 'False'
track_functions: pl
wal_compression: 'True'
wal_keep_size: '0'
wal_level: replica
wal_log_hints: 'False'
retry_timeout: 10
ttl: 30
initdb:
- auth-host: md5
- auth-local: trust
postgresql:
config_dir: /home/postgres/pgdata/conf
custom_conf: /home/postgres/conf/postgresql.conf
parameters:
log_destination: csvlog
log_directory: log
log_filename: postgresql-%Y-%m-%d.log
logging_collector: 'True'
pg_stat_statements.track_utility: 'False'
shared_buffers: 128MB
shared_preload_libraries: pg_stat_statements,auto_explain,bg_mon,pgextwlist,pg_auth_mon,set_user,pg_cron,pg_stat_kcache,timescaledb,pgaudit
pg_hba:
- host all all 0.0.0.0/0 md5
- host all all ::/0 md5
- local all all trust
- host all all 127.0.0.1/32 trust
- host all all ::1/128 trust
- local replication all trust
- host replication all 0.0.0.0/0 md5
- host replication all ::/0 md5
2025-04-28 09:29:39,849 - bootstrapping - INFO - Configuring pgbouncer
2025-04-28 09:29:39,849 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2025-04-28 09:29:39,849 - bootstrapping - INFO - Configuring crontab
2025-04-28 09:29:39,849 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2025-04-28 09:29:39,850 - bootstrapping - INFO - Configuring patroni
2025-04-28 09:29:39,945 - bootstrapping - INFO - Writing to file /run/postgres.yml
2025-04-28 09:29:39,945 - bootstrapping - INFO - Configuring standby-cluster
2025-04-28 09:29:39,945 - bootstrapping - INFO - Configuring log
2025-04-28 09:29:39,945 - bootstrapping - INFO - Configuring wal-e
2025-04-28 09:29:39,945 - bootstrapping - INFO - Configuring pgqd
2025-04-28 09:29:39,945 - bootstrapping - INFO - Configuring certificate
2025-04-28 09:29:39,945 - bootstrapping - INFO - Generating ssl self-signed certificate
2025-04-28 09:29:41,052 - bootstrapping - INFO - Configuring bootstrap
2025-04-28 09:29:41,052 - bootstrapping - INFO - Configuring pam-oauth2
2025-04-28 09:29:41,052 - bootstrapping - INFO - No PAM_OAUTH2 configuration was specified, skipping
2025-04-28 09:29:43,658 INFO: Selected new K8s API server endpoint https://172.31.0.5:6443
2025-04-28 09:29:43,763 WARNING: postgresql parameter wal_keep_size=0 failed validation, defaulting to 128MB
2025-04-28 09:29:43,763 WARNING: postgresql parameter wal_log_hints=False failed validation, defaulting to on
2025-04-28 09:29:43,764 INFO: No PostgreSQL configuration items changed, nothing to reload.
2025-04-28 09:29:43,872 INFO: Lock owner: postgres-cluster-postgresql-0; I am postgres-cluster-postgresql-2
2025-04-28 09:29:43,944 INFO: trying to bootstrap from leader 'postgres-cluster-postgresql-0'
2025-04-28 09:29:43,945 ERROR: failed to bootstrap from leader 'postgres-cluster-postgresql-0'
2025-04-28 09:29:43,945 INFO: Removing data directory: /home/postgres/pgdata/pgroot/data
2025-04-28 09:29:47,039 INFO: Lock owner: postgres-cluster-postgresql-0; I am postgres-cluster-postgresql-2
2025-04-28 09:29:47,040 INFO: trying to bootstrap from leader 'postgres-cluster-postgresql-0'
2025-04-28 09:29:47,040 ERROR: failed to bootstrap from leader 'postgres-cluster-postgresql-0'
2025-04-28 09:29:47,040 INFO: Removing data directory: /home/postgres/pgdata/pgroot/data
2025-04-28 09:29:57,045 INFO: Lock owner: postgres-cluster-postgresql-0; I am postgres-cluster-postgresql-2
2025-04-28 09:29:57,045 INFO: trying to bootstrap from leader 'postgres-cluster-postgresql-0'
2025-04-28 09:29:57,046 ERROR: failed to bootstrap from leader 'postgres-cluster-postgresql-0'
2025-04-28 09:29:57,046 INFO: Removing data directory: /home/postgres/pgdata/pgroot/data
2025-04-28 09:30:07,044 INFO: Lock owner: postgres-cluster-postgresql-0; I am postgres-cluster-postgresql-2
Primary and secondary data are out of sync primary
kubectl exec -it postgres-cluster-postgresql-0 -- bash
root@postgres-cluster-postgresql-0:/home/postgres# psql -Upostgresql
postgres=# DROP TABLE IF EXISTS tmp_table; CREATE TABLE IF NOT EXISTS tmp_table (id INT PRIMARY KEY , value text); INSERT INTO tmp_table (id,value) VALUES (1,'vzivh');
DROP TABLE
CREATE TABLE
INSERT 0 1
postgres=# SELECT value FROM tmp_table WHERE id = 1;
value
-------
vzivh
(1 row)
postgres=# \dt
List of relations
Schema | Name | Type | Owner
--------+--------------+-------+----------
public | postgres_log | table | postgres
public | tmp_table | table | postgres
(2 rows)
secondary
kubectl exec -it postgres-cluster-postgresql-1 -- bash
root@postgres-cluster-postgresql-1:/home/postgres# psql -Upostgres
psql (16.4 (Ubuntu 16.4-1.pgdg22.04+1))
Type "help" for help.
postgres=# SELECT value FROM tmp_table WHERE id = 1;
ERROR: relation "tmp_table" does not exist
LINE 1: SELECT value FROM tmp_table WHERE id = 1;
^
postgres=# SELECT value FROM tmp_table WHERE id = 1;
ERROR: relation "tmp_table" does not exist
LINE 1: SELECT value FROM tmp_table WHERE id = 1;
^
postgres=# \dt
List of relations
Schema | Name | Type | Owner
--------+--------------+-------+----------
public | postgres_log | table | postgres
(1 row)
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
Additional context Add any other context about the problem here.