kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[BUG] postgresql cluster upgrade to v1 scale-out ERROR: failed to bootstrap from leader

Open JashBook opened this issue 8 months ago • 0 comments

Describe the bug A clear and concise description of what the bug is.

kbcli version
Kubernetes: v1.30.4-vke.5
KubeBlocks: 1.0.0-beta.47,0.9.4-beta.20
kbcli: 1.0.0-beta.21

1. ERROR: failed to bootstrap from leader 'postgres-cluster-postgresql-0' 2. Primary and secondary data are out of sync.

To Reproduce Steps to reproduce the behavior:

  1. upgrade to v1
echo yes|kbcli cluster upgrade-to-v1  postgres-cluster
┌──────────────────────────────────────────────────────────────┐    ┌─────────────────────────────────────────────────────────────┐
│apiVersion: apps.kubeblocks.io/v1alpha1                       │    │apiVersion: apps.kubeblocks.io/v1                            │
│kind: Cluster                                                 │    │kind: Cluster                                                │
│metadata:                                                     │    │metadata:                                                    │
│  annotations:                                                │    │  annotations:                                               │
│    kubeblocks.io/crd-api-version: apps.kubeblocks.io/v1alpha1│    │    kubeblocks.io/crd-api-version: apps.kubeblocks.io/v1     │
│    kubeblocks.io/reconcile: "2025-04-28T07:40:59.345764723Z" │    │    kubeblocks.io/reconcile: "2025-04-28T07:40:59.345764723Z"│
│  creationTimestamp: "2025-04-28T07:23:40Z"                   │    │  creationTimestamp: "2025-04-28T07:23:40Z"                  │
│  finalizers:                                                 │    │  finalizers:                                                │
│  - cluster.kubeblocks.io/finalizer                           │    │  - cluster.kubeblocks.io/finalizer                          │
│  generation: 2                                               │    │  generation: 2                                              │
│  name: postgres-cluster                                      │    │  name: postgres-cluster                                     │
│  namespace: default                                          │    │  namespace: default                                         │
│  resourceVersion: "36299"                                    │    │  resourceVersion: "36299"                                   │
│  uid: 714dbbce-0c2e-469e-88be-380ff0218720                   │    │  uid: 714dbbce-0c2e-469e-88be-380ff0218720                  │
│spec:                                                         │    │spec:                                                        │
│  componentSpecs:                                             │    │  componentSpecs:                                            │
│  - componentDef: postgresql-16                               │    │  - componentDef: postgresql-16-1.0.0-alpha.0                │
│    name: postgresql                                          │    │    name: postgresql                                         │
│    replicas: 2                                               │    │    replicas: 2                                              │
│    resources:                                                │    │    resources:                                               │
│      limits:                                                 │    │      limits:                                                │
│        cpu: 100m                                             │    │        cpu: 100m                                            │
│        memory: 512Mi                                         │    │        memory: 512Mi                                        │
│      requests:                                               │    │      requests:                                              │
│        cpu: 100m                                             │    │        cpu: 100m                                            │
│        memory: 512Mi                                         │    │        memory: 512Mi                                        │
│    serviceVersion: 16.4.0                                    │    │    serviceVersion: 16.4.0                                   │
│    switchPolicy:                                             │    │    volumeClaimTemplates:                                    │
│      type: Noop                                              │    │    - name: data                                             │
│    updateStrategy: BestEffortParallel                        │    │      spec:                                                  │
│    volumeClaimTemplates:                                     │    │        accessModes:                                         │
│    - name: data                                              │    │        - ReadWriteOnce                                      │
│      spec:                                                   │    │        resources:                                           │
│        accessModes:                                          │    │          requests:                                          │
│        - ReadWriteOnce                                       │    │            storage: 20Gi                                    │
│        resources:                                            │    │  terminationPolicy: WipeOut                                 │
│          requests:                                           │    │status: {}                                                   │
│            storage: 20Gi                                     │    │                                                             │
│  resources:                                                  │    └─────────────────────────────────────────────────────────────┘
│    cpu: "0"                                                  │
│    memory: "0"                                               │
│  storage:                                                    │
│    size: "0"                                                 │
│  terminationPolicy: WipeOut                                  │
│status: {}                                                    │
│                                                              │
└──────────────────────────────────────────────────────────────┘
Cluster postgres-cluster will be converted to v1 with output as yaml.
Please type 'Yes/yes' to confirm your operation: yes
postgres-cluster-postgresql-postgresql
Cluster postgres-cluster has converted successfully, you can view the spec:
	kubectl get clusters.apps.kubeblocks.io postgres-cluster -n default -oyaml

kubectl get clusters.apps.kubeblocks.io postgres-cluster -n default -oyaml
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  annotations:
    kubeblocks.io/crd-api-version: apps.kubeblocks.io/v1
    kubeblocks.io/reconcile: "2025-04-28T07:40:59.345764723Z"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps.kubeblocks.io/v1alpha1","kind":"Cluster","metadata":{"annotations":{},"name":"postgres-cluster","namespace":"default"},"spec":{"componentSpecs":[{"componentDef":"postgresql-16","name":"postgresql","replicas":2,"resources":{"limits":{"cpu":"100m","memory":"0.5Gi"},"requests":{"cpu":"100m","memory":"0.5Gi"}},"switchPolicy":{"type":"Noop"},"updateStrategy":"BestEffortParallel","volumeClaimTemplates":[{"name":"data","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"20Gi"}},"storageClassName":null}}]}],"terminationPolicy":"WipeOut"}}
  creationTimestamp: "2025-04-28T07:23:40Z"
  finalizers:
  - cluster.kubeblocks.io/finalizer
  generation: 3
  name: postgres-cluster
  namespace: default
  resourceVersion: "59059"
  uid: 714dbbce-0c2e-469e-88be-380ff0218720
spec:
  componentSpecs:
  - componentDef: postgresql-16-1.0.0-alpha.0
    name: postgresql
    replicas: 2
    resources:
      limits:
        cpu: 100m
        memory: 512Mi
      requests:
        cpu: 100m
        memory: 512Mi
    serviceVersion: 16.4.0
    volumeClaimTemplates:
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
  terminationPolicy: WipeOut
status:
  components:
    postgresql:
      phase: Updating
  conditions:
  - lastTransitionTime: "2025-04-28T08:07:35Z"
    message: 'The operator has started the provisioning of Cluster: postgres-cluster'
    observedGeneration: 3
    reason: PreCheckSucceed
    status: "True"
    type: ProvisioningStarted
  - lastTransitionTime: "2025-04-28T07:23:40Z"
    message: Successfully applied for resources
    observedGeneration: 3
    reason: ApplyResourcesSucceed
    status: "True"
    type: ApplyResources
  - lastTransitionTime: "2025-04-28T07:35:02Z"
    message: all pods of components are ready, waiting for the probe detection successful
    reason: AllReplicasReady
    status: "True"
    type: ReplicasReady
  - lastTransitionTime: "2025-04-28T07:35:02Z"
    message: 'Cluster: postgres-cluster is ready, current phase is Running'
    reason: ClusterReady
    status: "True"
    type: Ready
  observedGeneration: 3
  phase: Updating
  1. scale-out
kbcli cluster scale-out postgres-cluster --auto-approve --force=true --components postgresql --replicas 1  --namespace default
  1. See error
➜  ~ kubectl get cluster postgres-cluster 
NAME               CLUSTER-DEFINITION   TERMINATION-POLICY   STATUS     AGE
postgres-cluster                        WipeOut              Updating   142m
➜  ~ 
➜  ~ kubectl get pod -l app.kubernetes.io/instance=postgres-cluster  

NAME                            READY   STATUS    RESTARTS   AGE
postgres-cluster-postgresql-0   5/5     Running   0          83m
postgres-cluster-postgresql-1   5/5     Running   0          80m
postgres-cluster-postgresql-2   4/5     Running   0          17m

 `kbcli cluster list-instances postgres-cluster --namespace default `
    
NAME                            NAMESPACE   CLUSTER            COMPONENT    STATUS    ROLE        ACCESSMODE   AZ             CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE     NODE                      CREATED-TIME                 
postgres-cluster-postgresql-0   default     postgres-cluster   postgresql   Running   primary                  cn-beijing-b   100m / 100m          512Mi / 512Mi           data:20Gi   172.31.0.21/172.31.0.21   Apr 28,2025 16:23 UTC+0800   
postgres-cluster-postgresql-1   default     postgres-cluster   postgresql   Running   secondary                cn-beijing-b   100m / 100m          512Mi / 512Mi           data:20Gi   172.31.0.43/172.31.0.43   Apr 28,2025 16:25 UTC+0800   
postgres-cluster-postgresql-2   default     postgres-cluster   postgresql   Running   <none>                   cn-beijing-b   100m / 100m          512Mi / 512Mi           data:20Gi   172.31.0.7/172.31.0.7     Apr 28,2025 17:28 UTC+0800

logs error pod

➜  ~ kubectl logs postgres-cluster-postgresql-2
Defaulted container "postgresql" out of: postgresql, pgbouncer, exporter, kbagent, config-manager, pg-init-container (init), init-dbctl (init), init-kbagent (init), kbagent-worker (init)
2025-04-28 09:29:37,648 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2025-04-28 09:29:39,651 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2025-04-28 09:29:39,652 - bootstrapping - INFO - No meta-data available for this provider
2025-04-28 09:29:39,652 - bootstrapping - INFO - Looks like you are running local
2025-04-28 09:29:39,746 - bootstrapping - INFO - kubeblocks generate local configuration: 
bootstrap:
  dcs:
    check_timeline: true
    loop_wait: 10
    max_timelines_history: 0
    maximum_lag_on_failover: 1048576
    postgresql:
      parameters:
        archive_command: /bin/true
        archive_mode: 'on'
        autovacuum_analyze_scale_factor: '0.1'
        autovacuum_max_workers: '3'
        autovacuum_vacuum_scale_factor: '0.05'
        checkpoint_completion_target: '0.9'
        log_autovacuum_min_duration: '10000'
        log_checkpoints: 'True'
        log_connections: 'False'
        log_disconnections: 'False'
        log_min_duration_statement: '1000'
        log_statement: ddl
        log_temp_files: 128kB
        max_connections: '56'
        max_locks_per_transaction: '64'
        max_prepared_transactions: '100'
        max_replication_slots: '16'
        max_wal_senders: '64'
        max_worker_processes: '8'
        tcp_keepalives_idle: 45s
        tcp_keepalives_interval: 10s
        track_commit_timestamp: 'False'
        track_functions: pl
        wal_compression: 'True'
        wal_keep_size: '0'
        wal_level: replica
        wal_log_hints: 'False'
    retry_timeout: 10
    ttl: 30
  initdb:
  - auth-host: md5
  - auth-local: trust
postgresql:
  config_dir: /home/postgres/pgdata/conf
  custom_conf: /home/postgres/conf/postgresql.conf
  parameters:
    log_destination: csvlog
    log_directory: log
    log_filename: postgresql-%Y-%m-%d.log
    logging_collector: 'True'
    pg_stat_statements.track_utility: 'False'
    shared_buffers: 128MB
    shared_preload_libraries: pg_stat_statements,auto_explain,bg_mon,pgextwlist,pg_auth_mon,set_user,pg_cron,pg_stat_kcache,timescaledb,pgaudit
  pg_hba:
  - host     all             all             0.0.0.0/0                md5
  - host     all             all             ::/0                     md5
  - local    all             all                                     trust
  - host     all             all             127.0.0.1/32            trust
  - host     all             all             ::1/128                 trust
  - local     replication     all                                    trust
  - host      replication     all             0.0.0.0/0               md5
  - host      replication     all             ::/0                    md5

2025-04-28 09:29:39,849 - bootstrapping - INFO - Configuring pgbouncer
2025-04-28 09:29:39,849 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2025-04-28 09:29:39,849 - bootstrapping - INFO - Configuring crontab
2025-04-28 09:29:39,849 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2025-04-28 09:29:39,850 - bootstrapping - INFO - Configuring patroni
2025-04-28 09:29:39,945 - bootstrapping - INFO - Writing to file /run/postgres.yml
2025-04-28 09:29:39,945 - bootstrapping - INFO - Configuring standby-cluster
2025-04-28 09:29:39,945 - bootstrapping - INFO - Configuring log
2025-04-28 09:29:39,945 - bootstrapping - INFO - Configuring wal-e
2025-04-28 09:29:39,945 - bootstrapping - INFO - Configuring pgqd
2025-04-28 09:29:39,945 - bootstrapping - INFO - Configuring certificate
2025-04-28 09:29:39,945 - bootstrapping - INFO - Generating ssl self-signed certificate
2025-04-28 09:29:41,052 - bootstrapping - INFO - Configuring bootstrap
2025-04-28 09:29:41,052 - bootstrapping - INFO - Configuring pam-oauth2
2025-04-28 09:29:41,052 - bootstrapping - INFO - No PAM_OAUTH2 configuration was specified, skipping
2025-04-28 09:29:43,658 INFO: Selected new K8s API server endpoint https://172.31.0.5:6443
2025-04-28 09:29:43,763 WARNING: postgresql parameter wal_keep_size=0 failed validation, defaulting to 128MB
2025-04-28 09:29:43,763 WARNING: postgresql parameter wal_log_hints=False failed validation, defaulting to on
2025-04-28 09:29:43,764 INFO: No PostgreSQL configuration items changed, nothing to reload.
2025-04-28 09:29:43,872 INFO: Lock owner: postgres-cluster-postgresql-0; I am postgres-cluster-postgresql-2
2025-04-28 09:29:43,944 INFO: trying to bootstrap from leader 'postgres-cluster-postgresql-0'
2025-04-28 09:29:43,945 ERROR: failed to bootstrap from leader 'postgres-cluster-postgresql-0'
2025-04-28 09:29:43,945 INFO: Removing data directory: /home/postgres/pgdata/pgroot/data
2025-04-28 09:29:47,039 INFO: Lock owner: postgres-cluster-postgresql-0; I am postgres-cluster-postgresql-2
2025-04-28 09:29:47,040 INFO: trying to bootstrap from leader 'postgres-cluster-postgresql-0'
2025-04-28 09:29:47,040 ERROR: failed to bootstrap from leader 'postgres-cluster-postgresql-0'
2025-04-28 09:29:47,040 INFO: Removing data directory: /home/postgres/pgdata/pgroot/data
2025-04-28 09:29:57,045 INFO: Lock owner: postgres-cluster-postgresql-0; I am postgres-cluster-postgresql-2
2025-04-28 09:29:57,045 INFO: trying to bootstrap from leader 'postgres-cluster-postgresql-0'
2025-04-28 09:29:57,046 ERROR: failed to bootstrap from leader 'postgres-cluster-postgresql-0'
2025-04-28 09:29:57,046 INFO: Removing data directory: /home/postgres/pgdata/pgroot/data
2025-04-28 09:30:07,044 INFO: Lock owner: postgres-cluster-postgresql-0; I am postgres-cluster-postgresql-2

Primary and secondary data are out of sync primary

kubectl exec -it postgres-cluster-postgresql-0 -- bash
root@postgres-cluster-postgresql-0:/home/postgres# psql -Upostgresql
postgres=# DROP TABLE IF EXISTS tmp_table; CREATE TABLE IF NOT EXISTS tmp_table (id INT PRIMARY KEY , value text); INSERT INTO tmp_table (id,value) VALUES (1,'vzivh');
DROP TABLE
CREATE TABLE
INSERT 0 1
postgres=# SELECT value FROM tmp_table WHERE id = 1;
 value 
-------
 vzivh
(1 row)

postgres=# \dt
            List of relations
 Schema |     Name     | Type  |  Owner   
--------+--------------+-------+----------
 public | postgres_log | table | postgres
 public | tmp_table    | table | postgres
(2 rows)

secondary

kubectl exec -it postgres-cluster-postgresql-1 -- bash
root@postgres-cluster-postgresql-1:/home/postgres# psql -Upostgres
psql (16.4 (Ubuntu 16.4-1.pgdg22.04+1))
Type "help" for help.

postgres=# SELECT value FROM tmp_table WHERE id = 1;
ERROR:  relation "tmp_table" does not exist
LINE 1: SELECT value FROM tmp_table WHERE id = 1;
                          ^
postgres=# SELECT value FROM tmp_table WHERE id = 1;
ERROR:  relation "tmp_table" does not exist
LINE 1: SELECT value FROM tmp_table WHERE id = 1;
                          ^
postgres=# \dt
            List of relations
 Schema |     Name     | Type  |  Owner   
--------+--------------+-------+----------
 public | postgres_log | table | postgres
(1 row)

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context Add any other context about the problem here.

JashBook avatar Apr 28 '25 09:04 JashBook