postgres-operator
postgres-operator copied to clipboard
Single master fails to restart
I have a single member cluster (master only) deployed on opnstack, kubernetes.
- After successful installation when i try to edit any parameter in the postgresql CRD configuration, the pod restarts but the database does not start up. It continues to reject connections and repeatedly logs the same data (like in the pod logs below) The pg_log complains of missing history files.
- Also for a simple restart without any configuration changes I reduce the "numberOfInstances" in the postgresql config to 0 and again to 1. Is this the right approach or is there a better one?
Please, answer some short questions which should help us to understand your problem / question better?
- Which image of the operator are you using? e.g. registry.opensource.zalan.do/acid/postgres-operator:v1.8.2
- registry.opensource.zalan.do/acid/postgres-operator:v1.7.0
- Where do you run it - cloud or metal? Kubernetes or OpenShift? [AWS K8s | GCP ... | Bare Metal K8s]
- Kubernetes v1.21.5 on a openstack
- Are you running Postgres Operator in production? [yes | no]
- yes
- Type of issue? [Bug report, question, feature request, etc.]
- question
Some general remarks when posting a bug report:
- Please, check the operator, pod (Patroni) and postgresql logs first. When copy-pasting many log lines please do it in a separate
Operator logs
`time="2022-09-15T11:24:33Z" level=info msg="SYNC event has been queued" cluster-name=insights-gs-execute-staging/ai-postgres pkg=controller worker=7
time="2022-09-15T11:24:33Z" level=info msg="syncing of the cluster started" cluster-name=insights-gs-execute-staging/ai-postgres pkg=controller worker=7
time="2022-09-15T11:24:33Z" level=debug msg="team API is disabled" cluster-name=insights-gs-execute-staging/ai-postgres pkg=cluster
time="2022-09-15T11:24:33Z" level=debug msg="team API is disabled" cluster-name=insights-gs-execute-staging/ai-postgres pkg=cluster
time="2022-09-15T11:24:33Z" level=info msg="syncing secrets" cluster-name=insights-gs-execute-staging/ai-postgres pkg=cluster
time="2022-09-15T11:24:35Z" level=debug msg="secret insights-gs-execute-staging/zalando.ai-postgres.credentials.postgresql.acid.zalan.do already exists, fetching its password" cluster-name=insights-gs-execute-staging/ai-postgres pkg=cluster
time="2022-09-15T11:24:38Z" level=debug msg="secret insights-gs-execute-staging/standby.ai-postgres.credentials.postgresql.acid.zalan.do already exists, fetching its password" cluster-name=insights-gs-execute-staging/ai-postgres pkg=cluster
time="2022-09-15T11:24:41Z" level=debug msg="secret insights-gs-execute-staging/postgres.ai-postgres.credentials.postgresql.acid.zalan.do already exists, fetching its password" cluster-name=insights-gs-execute-staging/ai-postgres pkg=cluster
time="2022-09-15T11:24:41Z" level=debug msg="syncing master service" cluster-name=insights-gs-execute-staging/ai-postgres pkg=cluster
time="2022-09-15T11:24:45Z" level=debug msg="final load balancer source ranges as seen in a service spec (not necessarily applied): ["127.0.0.1/32"]" cluster-name=insights-gs-execute-staging/ai-postgres pkg=cluster
time="2022-09-15T11:24:45Z" level=debug msg="syncing replica service" cluster-name=insights-gs-execute-staging/ai-postgres pkg=cluster
time="2022-09-15T11:24:48Z" level=debug msg="final load balancer source ranges as seen in a service spec (not necessarily applied): ["127.0.0.1/32"]" cluster-name=insights-gs-execute-staging/ai-postgres pkg=cluster
time="2022-09-15T11:24:48Z" level=debug msg="syncing volumes using "pvc" storage resize mode" cluster-name=insights-gs-execute-staging/ai-postgres pkg=cluster
time="2022-09-15T11:24:49Z" level=info msg="volume claims do not require changes" cluster-name=insights-gs-execute-staging/ai-postgres pkg=cluster
time="2022-09-15T11:24:49Z" level=debug msg="syncing statefulsets" cluster-name=insights-gs-execute-staging/ai-postgres pkg=cluster
time="2022-09-15T11:24:54Z" level=info msg="Mount additional volumes: [{Name:empty MountPath:/opt/empty SubPath: TargetContainers:[all] VolumeSource:{HostPath:nil EmptyDir:&EmptyDirVolumeSource{Medium:,SizeLimit:
Pod logs
`2022-09-15 10:27:50,656 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?) 2022-09-15 10:27:52,047 - bootstrapping - INFO - Looks like your running openstack 2022-09-15 10:27:52,168 - bootstrapping - INFO - Configuring pgbouncer 2022-09-15 10:27:52,169 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping 2022-09-15 10:27:52,169 - bootstrapping - INFO - Configuring wal-e 2022-09-15 10:27:52,169 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/WALE_S3_PREFIX 2022-09-15 10:27:52,170 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/WALG_S3_PREFIX 2022-09-15 10:27:52,170 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/AWS_ACCESS_KEY_ID 2022-09-15 10:27:52,170 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/AWS_SECRET_ACCESS_KEY 2022-09-15 10:27:52,170 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/WALE_S3_ENDPOINT 2022-09-15 10:27:52,170 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/AWS_ENDPOINT 2022-09-15 10:27:52,170 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/WALG_DISABLE_S3_SSE 2022-09-15 10:27:52,171 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/WALG_DOWNLOAD_CONCURRENCY 2022-09-15 10:27:52,171 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/WALG_UPLOAD_CONCURRENCY 2022-09-15 10:27:52,171 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/USE_WALG_RESTORE 2022-09-15 10:27:52,171 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/WALE_LOG_DESTINATION 2022-09-15 10:27:52,171 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/PGPORT 2022-09-15 10:27:52,171 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/BACKUP_NUM_TO_RETAIN 2022-09-15 10:27:52,187 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/TMPDIR 2022-09-15 10:27:52,187 - bootstrapping - INFO - Configuring crontab 2022-09-15 10:27:52,187 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability 2022-09-15 10:27:52,214 - bootstrapping - INFO - Configuring standby-cluster 2022-09-15 10:27:52,214 - bootstrapping - INFO - Configuring pgqd 2022-09-15 10:27:52,214 - bootstrapping - INFO - Configuring bootstrap 2022-09-15 10:27:52,214 - bootstrapping - INFO - Configuring patroni 2022-09-15 10:27:52,224 - bootstrapping - INFO - Writing to file /run/postgres.yml 2022-09-15 10:27:52,224 - bootstrapping - INFO - Configuring log 2022-09-15 10:27:52,224 - bootstrapping - INFO - Configuring certificate 2022-09-15 10:27:52,224 - bootstrapping - INFO - Generating ssl self-signed certificate 2022-09-15 10:27:52,290 - bootstrapping - INFO - Configuring pam-oauth2 2022-09-15 10:27:52,291 - bootstrapping - INFO - Writing to file /etc/pam.d/postgresql 2022-09-15 10:27:54,130 WARNING: Kubernetes RBAC doesn't allow GET access to the 'kubernetes' endpoint in the 'default' namespace. Disabling 'bypass_api_service'. 2022-09-15 10:27:54,191 INFO: No PostgreSQL configuration items changed, nothing to reload. 2022-09-15 10:27:54,264 WARNING: Postgresql is not running. 2022-09-15 10:27:54,265 INFO: Lock owner: None; I am ai-postgres-0 2022-09-15 10:27:54,276 INFO: pg_controldata: pg_control version number: 1100 Catalog version number: 201809051 Database system identifier: 6925770682722574383 Database cluster state: shut down in recovery pg_control last modified: Thu Sep 15 06:04:08 2022 Latest checkpoint location: DC0F/31755350 Latest checkpoint's REDO location: DC0F/31755350 Latest checkpoint's REDO WAL file: 000000020000DC0F00000031 Latest checkpoint's TimeLineID: 2 Latest checkpoint's PrevTimeLineID: 2 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0:6236252 Latest checkpoint's NextOID: 1821843 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 561 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 0 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Latest checkpoint's oldestCommitTsXid: 0 Latest checkpoint's newestCommitTsXid: 0 Time of latest checkpoint: Fri Sep 2 07:14:51 2022 Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no wal_level setting: logical wal_log_hints setting: on max_connections setting: 200 max_worker_processes setting: 8 max_prepared_xacts setting: 0 max_locks_per_xact setting: 64 track_commit_timestamp setting: off Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Size of a large-object chunk: 2048 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 Mock authentication nonce: e1f4672423ecf0ef7f60b88c1eb5650579a43f927807d8be1d19881b5387f934
2022-09-15 10:27:54,329 INFO: Lock owner: None; I am ai-postgres-0 2022-09-15 10:27:54,438 INFO: starting as a secondary 2022-09-15 10:27:54,502 WARNING: Removing unexpected parameter=cron_database_name value=postgres from the config 2022-09-15 10:27:55.020 GMT [71] LOG: skipping missing configuration file "/var/pv/data/user.conf" 2022-09-15 10:27:55.030 GMT [71] LOG: skipping missing configuration file "/var/pv/data/user.conf" 2022-09-15 10:27:55,079 INFO: postmaster pid=71 /var/run/postgresql:5432 - no response 2022-09-15 10:27:55 UTC [71]: [3-1] 6322fe2b.47 0 LOG: listening on IPv4 address "0.0.0.0", port 5432 2022-09-15 10:27:55 UTC [71]: [4-1] 6322fe2b.47 0 LOG: listening on IPv6 address "::", port 5432 2022-09-15 10:27:55 UTC [71]: [5-1] 6322fe2b.47 0 LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2022-09-15 10:27:56 UTC [71]: [6-1] 6322fe2b.47 0 LOG: redirecting log output to logging collector process 2022-09-15 10:27:56 UTC [71]: [7-1] 6322fe2b.47 0 HINT: Future log output will appear in directory "../pg_log". /var/run/postgresql:5432 - rejecting connections /var/run/postgresql:5432 - rejecting connections /var/run/postgresql:5432 - rejecting connections /var/run/postgresql:5432 - no response 2022-09-15 10:28:01,024 WARNING: Postgresql is not running. 2022-09-15 10:28:01,024 INFO: Lock owner: None; I am ai-postgres-0 `
pg_log
ERROR: 2022/09/15 00:00:35.585523 Archive '00000005.history' does not exist. 2022-09-15 00:00:44 UTC [25600]: [8-1] 63226b28.6400 0 LOG: ending log output to stderr 2022-09-15 00:00:44 UTC [25600]: [9-1] 63226b28.6400 0 HINT: Future log output will go to log destination "csvlog". ERROR: 2022/09/15 00:00:47.141790 Archive '00000005.history' does not exist. 2022-09-15 00:00:54 UTC [25729]: [8-1] 63226b32.6481 0 LOG: ending log output to stderr 2022-09-15 00:00:54 UTC [25729]: [9-1] 63226b32.6481 0 HINT: Future log output will go to log destination "csvlog". ERROR: 2022/09/15 00:00:58.157736 Archive '00000005.history' does not exist. 2022-09-15 00:01:05 UTC [25859]: [8-1] 63226b3d.6503 0 LOG: ending log output to stderr 2022-09-15 00:01:05 UTC [25859]: [9-1] 63226b3d.6503 0 HINT: Future log output will go to log destination "csvlog". ERROR: 2022/09/15 00:01:08.793950 Archive '00000005.history' does not exist.
postgres configuration
`bootstrap: dcs: loop_wait: 10 maximum_lag_on_failover: 33554432 postgresql: parameters: archive_mode: 'true' archive_timeout: '160' autovacuum_analyze_scale_factor: '0.02' autovacuum_max_workers: '12' autovacuum_vacuum_cost_delay: 5ms autovacuum_vacuum_scale_factor: '0.05' checkpoint_completion_target: '0.9' checkpoint_timeout: 1d cron_database_name: postgres default_statistics_target: '500' effective_cache_size: 24GB effective_io_concurrency: '200' hot_standby: 'on' log_autovacuum_min_duration: 0 log_checkpoints: 'on' log_connections: 'on' log_disconnections: 'on' log_line_prefix: '%t [%p]: [%l-1] %c %x %d %u %a %h ' log_lock_waits: 'on' log_min_duration_statement: 500 log_statement: all log_temp_files: 0 maintenance_work_mem: 2GB max_connections: '200' max_parallel_maintenance_workers: '4' max_parallel_workers: '8' max_parallel_workers_per_gather: '4' max_replication_slots: 10 max_wal_senders: '90' max_wal_size: 16GB max_worker_processes: '8' min_wal_size: 4GB random_page_cost: '1.1' tcp_keepalives_idle: 900 tcp_keepalives_interval: 100 track_functions: all vacuum_cost_limit: '2400' wal_buffers: 16MB wal_compression: 'on' wal_keep_segments: '32' wal_level: logical wal_log_hints: 'on' work_mem: 209kB use_pg_rewind: true use_slots: true retry_timeout: 10 ttl: 30 initdb:
- auth-host: md5
- auth-local: trust
- data-checksums
- encoding: UTF8
- locale: en_US.UTF-8
post_init: /scripts/post_init.sh "zalandos"
users:
zalandos:
options:
- CREATEDB
- NOLOGIN password: '' kubernetes: bypass_api_service: true labels: application: spilo pod_ip: 10.42.14.232 ports:
- name: postgresql port: 5432 role_label: spilo-role scope_label: cluster-name use_endpoints: true namespace: insights-gs-execute-staging postgresql: authentication: replication: password: XXXX username: standby superuser: password: XXXX username: postgres basebackup_fast_xlog: command: /scripts/basebackup.sh retries: 2 bin_dir: /usr/lib/postgresql/11/bin callbacks: on_role_change: /scripts/on_role_change.sh zalandos true connect_address: 10.42.14.232:5432 create_replica_method:
- wal_e
- basebackup_fast_xlog data_dir: /home/postgres/pgdata/pgroot/data listen: '*:5432' name: ai-postgres-0 parameters: archive_command: 'envdir "/run/etc/wal-e.d/env" wal-g wal-push "%p" && find /home/postgres/pgdata/pgroot/data/pg_wal/ -type f -mtime +1 -delete ' bg_mon.history_buckets: 120 bg_mon.listen_address: 0.0.0.0 extwlist.custom_path: /scripts extwlist.extensions: btree_gin,btree_gist,citext,hstore,intarray,ltree,pgcrypto,pgq,pg_trgm,postgres_fdw,tablefunc,uuid-ossp,hypopg,timescaledb,pg_partman log_destination: csvlog log_directory: ../pg_log log_file_mode: '0644' log_filename: postgresql-%u.log log_rotation_age: 1d log_truncate_on_rotation: 'on' logging_collector: 'on' pg_stat_statements.track_utility: 'off' shared_buffers: 8GB shared_preload_libraries: pg_cron ssl: 'on' ssl_cert_file: /run/certs/server.crt ssl_key_file: /run/certs/server.key pg_hba:
- local all all trust
- hostssl all +zalandos 127.0.0.1/32 pam
- host all all 127.0.0.1/32 md5
- hostssl all +zalandos ::1/128 pam
- host all all ::1/128 md5
- local replication standby trust
- hostssl replication standby all md5
- hostnossl all all all reject
- hostssl all +zalandos all pam
- hostssl all all all md5 pgpass: /run/postgresql/pgpass recovery_conf: restore_command: envdir "/run/etc/wal-e.d/env" /scripts/restore_command.sh "%f" "%p" use_unix_socket: true use_unix_socket_repl: true wal_e: command: envdir /run/etc/wal-e.d/env bash /scripts/wale_restore.sh no_master: 1 retries: 2 threshold_backup_size_percentage: 30 threshold_megabytes: 102400 restapi: connect_address: 10.42.14.232:8008 listen: :8008 scope: ai-postgres `