postgres-operator
postgres-operator copied to clipboard
postgresql cannot boot because of the database backup will not be usable without all the WAL segments
Please, answer some short questions which should help us to understand your problem / question better?
- Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.8.2
- Where do you run it - cloud or metal? Kubernetes or OpenShift? Bare Metal K8s
- Are you running Postgres Operator in production? no
- Type of issue? feature request
Some general remarks when posting a bug report:
- Please, check the operator, pod (Patroni) and postgresql logs first. When copy-pasting many log lines please do it in a separate GitHub gist together with your Postgres CRD and configuration manifest.
- If you feel this issue might be more related to the Spilo docker image or Patroni, consider opening issues in the respective repos.
Hi, I create a postgresql instance with follow config:
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: acid-test
spec:
teamId: "acid"
numberOfInstances: 2
enableMasterLoadBalancer: true
postgresql:
version: "14"
parameters:
wal_level: "archive"
# archive_command: 'sh /script/backup.sh'
volume:
size: 1Gi
additionalVolumes:
- name: backup-script
mountPath: /tmp
targetContainers:
- all
volumeSource:
configMap:
name: backup-script
env:
- name: AWS_ACCESS_KEY_ID
value: minioadmin
- name: AWS_SECRET_ACCESS_KEY
value: minioadmin
- name: PGUSER
valueFrom:
secretKeyRef:
key: username
name: postgres.acid-test.credentials.postgresql.acid.zalan.do
- name: PGPASSWORD
valueFrom:
secretKeyRef:
key: password
name: postgres.acid-test.credentials.postgresql.acid.zalan.do
- name: AWS_ENDPOINT
value: http://10.10.15.114:9001
- name: AWS_S3_FORCE_PATH_STYLE
value: "true"
- name: USE_WALG_BACKUP
value: "true"
- name: WALG_S3_PREFIX
value: s3://test
resources:
requests:
cpu: 10m
memory: 100Mi
limits:
cpu: 1000m
memory: 1Gi
after commit to k8s , it can't create , like this:
zuvakin@linux:~/projects/k8s/postgresql$ kubectl get pod -n test
NAME READY STATUS RESTARTS AGE
acid-test-1 0/1 Running 0 9m1s
acid-test-0 1/1 Running 0 9m11s
acid-test-1 can't boot nornally, and logs as below:
2022-07-26 07:51:21,425 - bootstrapping - INFO - Configuring standby-cluster
2022-07-26 07:51:21,425 - bootstrapping - INFO - Configuring patroni
2022-07-26 07:51:21,431 - bootstrapping - INFO - Writing to file /run/postgres.yml
2022-07-26 07:51:22,635 WARNING: Kubernetes RBAC doesn't allow GET access to the 'kubernetes' endpoint in the 'default' namespace. Disabling 'bypass_api_service'.
2022-07-26 07:51:22,644 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-07-26 07:51:22,646 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:51:22,657 INFO: trying to bootstrap from leader 'acid-test-0'
2022-07-26 07:51:24,798 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:51:24,811 INFO: bootstrap from leader 'acid-test-0' in progress
Can not find any backups
2022-07-26 07:51:24,832 ERROR: Error creating replica using method wal_e: envdir /run/etc/wal-e.d/env bash /scripts/wale_restore.sh exited with code=1
2022-07-26 07:51:34,837 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:51:34,837 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:51:44,788 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:51:44,788 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:51:54,795 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:51:54,795 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:52:04,789 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:52:04,789 INFO: bootstrap from leader 'acid-test-0' in progress
1024+0 records in
1024+0 records out
16777216 bytes (17 MB, 16 MiB) copied, 0.0341451 s, 491 MB/s
NOTICE: base backup done, waiting for required WAL segments to be archived
2022-07-26 07:52:14,832 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:52:14,832 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:52:24,795 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:52:24,795 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:52:34,793 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:52:34,793 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:52:44,795 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:52:44,796 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:52:54,787 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:52:54,787 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:53:04,791 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:53:04,791 INFO: bootstrap from leader 'acid-test-0' in progress
WARNING: still waiting for all required WAL segments to be archived (60 seconds elapsed)
HINT: Check that your archive_command is executing properly. You can safely cancel this backup, but the database backup will not be usable without all the WAL segments.
2022-07-26 07:53:14,835 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:53:14,836 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:53:24,788 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:53:24,788 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:53:34,832 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:53:34,832 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:53:44,792 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:53:44,793 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:53:54,825 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:53:54,825 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:54:04,783 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:54:04,783 INFO: bootstrap from leader 'acid-test-0' in progress
WARNING: still waiting for all required WAL segments to be archived (120 seconds elapsed)
HINT: Check that your archive_command is executing properly. You can safely cancel this backup, but the database backup will not be usable without all the WAL segments.
2022-07-26 07:54:14,799 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:54:14,799 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:54:24,831 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:54:24,831 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:54:34,800 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:54:34,800 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:54:44,798 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:54:44,798 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:54:54,828 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:54:54,828 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:55:04,798 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:55:04,798 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:55:14,801 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:55:14,801 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:55:24,824 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:55:24,824 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:55:34,781 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:55:34,781 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:55:44,784 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:55:44,785 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:55:54,828 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:55:54,828 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:56:04,795 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:56:04,795 INFO: bootstrap from leader 'acid-test-0' in progress
WARNING: still waiting for all required WAL segments to be archived (240 seconds elapsed)
HINT: Check that your archive_command is executing properly. You can safely cancel this backup, but the database backup will not be usable without all the WAL segments.
2022-07-26 07:56:14,789 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:56:14,789 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:56:24,825 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:56:24,825 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:56:34,797 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:56:34,797 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:56:44,793 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:56:44,793 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:56:54,827 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:56:54,827 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:57:04,793 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:57:04,793 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:57:14,794 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:57:14,794 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:57:24,790 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:57:24,791 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:57:34,836 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:57:34,836 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:57:44,793 INFO: Lock owner: acid-test-0; I am acid-test-1
it appear ' Check that your archive_command is executing properly. You can safely cancel this backup, but the database backup will not be usable without all the WAL segments'
what's wrong with it , does the CR correct?
It is not a problem of postgre-operator or spilo. They only operate with the configuration you provide. If something is broken - you have to figure out what and fix it.
@godzilla-s Just stumbling over it while checking for a different backup-related issue: does the configuration style mentioned in this other issue answer help? https://github.com/zalando/postgres-operator/issues/1965#issuecomment-1185387027
@godzilla-s you need to check logs of pod acid-test-0
This trouble is occured because acid-test-0 can't connect to S3 storage for make base backup and archive wal files.
And acid-test-1 is waiting for all wal files from S3 storage.
When I investigated this trouble I entered into pod acid-test-0 and run a script for making backup:
envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"
and checked an errors.
There is example of good logs of master pod
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
RESET
2022-08-09 10:03:53,205 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:04:03,246 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:04:13,204 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:04:23,228 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:04:33,187 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:04:37.074 - /scripts/postgres_backup.sh - I was called as: /scripts/postgres_backup.sh /home/postgres/pgdata/pgroot/data
2022-08-09 10:04:37.148 - /scripts/postgres_backup.sh - producing a new backup
INFO: 2022/08/09 10:04:37.254145 Calling pg_start_backup()
INFO: 2022/08/09 10:04:37.779208 Starting a new tar bundle
INFO: 2022/08/09 10:04:37.779243 Walking ...
INFO: 2022/08/09 10:04:37.780051 Starting part 1 ...
2022-08-09 10:04:38.075 38 LOG Starting pgqd 3.3
2022-08-09 10:04:38.075 38 LOG auto-detecting dbs ...
INFO: 2022/08/09 10:04:39.741977 Packing ...
INFO: 2022/08/09 10:04:39.744801 Finished writing part 1.
INFO: 2022/08/09 10:04:39.856157 Starting part 2 ...
INFO: 2022/08/09 10:04:39.856185 /global/pg_control
INFO: 2022/08/09 10:04:39.858861 Finished writing part 2.
INFO: 2022/08/09 10:04:39.858872 Calling pg_stop_backup()
INFO: 2022/08/09 10:04:41.166761 Starting part 3 ...
INFO: 2022/08/09 10:04:41.180452 backup_label
INFO: 2022/08/09 10:04:41.180472 tablespace_map
INFO: 2022/08/09 10:04:41.180534 Finished writing part 3.
INFO: 2022/08/09 10:04:41.210225 Wrote backup with name base_00000007000000000000001D
2022-08-09 10:04:43,251 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:04:53,205 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:05:03,185 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:05:08.104 38 LOG {ticks: 0, maint: 0, retry: 0}
2022-08-09 10:05:13,227 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
leader's pod botstrap is finished only after S3 backup is done
something wrong with s3 server, thanks all
it worked now