postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

No space left on device

Open arezk84 opened this issue 2 years ago • 10 comments

I am working on version 4.7.4, I am facing an issue trying to make the cluster back to work again.

When I checked the error I found No space left on device I tried to resize the pvc but nothing has changed and still, the pod is not ready

Environment

  • Platform: (Kubernetes)
  • Platform Version: (4.7.4)
  • PGO Image Tag: (centos8)
  • Postgres Version (13)
  • Storage: (oci Oracle cloud)

here are the full logs, any help would be appreciated. Thanks

NWRAP_ERROR(1) - nwrap_files_cache_reload: Unable to open '/tmp/nss_wrapper/postgres/passwd' readonly -1:No such file or directory
NWRAP_ERROR(1) - nwrap_files_getpwuid: Error loading passwd file
nss_wrapper: user exists
nss_wrapper: group exists
nss_wrapper: environment configured
Thu Jun 23 09:50:30 UTC 2022 INFO: postgres-ha pre-bootstrap starting...
Thu Jun 23 09:50:30 UTC 2022 INFO: pgBackRest auto-config disabled
Thu Jun 23 09:50:30 UTC 2022 INFO: PGHA_PGBACKREST_LOCAL_S3_STORAGE, PGHA_PGBACKREST_LOCAL_GCS_STORAGE and PGHA_PGBACKREST_INITIALIZE will be ignored if provided
Thu Jun 23 09:50:30 UTC 2022 INFO: Defaults have been set for the following postgres-ha auto-configuration env vars: PGHA_DEFAULT_CONFIG, PGHA_BASE_BOOTSTRAP_CONFIG, PGHA_BASE_PG_CONFIG
Thu Jun 23 09:50:30 UTC 2022 INFO: Defaults have been set for the following postgres-ha env vars: PGHA_PATRONI_PORT
Thu Jun 23 09:50:30 UTC 2022 INFO: Defaults have been set for the following Patroni env vars: PATRONI_NAME, PATRONI_RESTAPI_LISTEN, PATRONI_RESTAPI_CONNECT_ADDRESS, PATRONI_POSTGRESQL_LISTEN, PATRONI_POSTGRESQL_CONNECT_ADDRESS
Thu Jun 23 09:50:30 UTC 2022 INFO: Setting postgres-ha configuration for database user credentials
Thu Jun 23 09:50:30 UTC 2022 INFO: Setting 'pguser' credentials using file system
Thu Jun 23 09:50:30 UTC 2022 INFO: Setting 'superuser' credentials using file system
Thu Jun 23 09:50:30 UTC 2022 INFO: Setting 'replicator' credentials using file system
Thu Jun 23 09:50:30 UTC 2022 INFO: Applying base bootstrap config to postgres-ha configuration
Thu Jun 23 09:50:30 UTC 2022 INFO: Applying base postgres config to postgres-ha configuration
Thu Jun 23 09:50:30 UTC 2022 INFO: Applying pgbackrest config to postgres-ha configuration
Thu Jun 23 09:50:30 UTC 2022 INFO: Applying standard (non-TLS) remote connection configuration to pg_hba.conf
Thu Jun 23 09:50:30 UTC 2022 INFO: Custom postgres-ha configuration file not detected
Thu Jun 23 09:50:30 UTC 2022 INFO: Finished building postgres-ha configuration file '/tmp/postgres-ha-bootstrap.yaml'
Thu Jun 23 09:50:30 UTC 2022 INFO: postgres-ha pre-bootstrap complete!  The following configuration will be utilized to initialize
******************************
postgres-ha (PGHA) env vars:
******************************
PGHA_BASE_PG_CONFIG=true
PGHA_PATRONI_PORT=8009
PGHA_PG_PORT=5432
PGHA_PGBACKREST_LOCAL_S3_STORAGE=false
PGHA_SYNC_REPLICATION=false
PGHA_USER=postgres
PGHA_DEFAULT_CONFIG=true
PGHA_PASSWORD_TYPE=
PGHA_REPLICA_REINIT_ON_START_FAIL=true
PGHA_PGBACKREST=true
PGHA_BASE_BOOTSTRAP_CONFIG=true
PGHA_STANDBY=false
PGHA_PGBACKREST_LOCAL_GCS_STORAGE=false
PGHA_TLS_ONLY=false
PGHA_TLS_ENABLED=false
PGHA_DATABASE=productionets
******************************
Patroni env vars:
******************************
PATRONI_POSTGRESQL_CONNECT_ADDRESS=10.244.2.193:5432
PATRONI_POSTGRESQL_LISTEN=0.0.0.0:5432
PATRONI_NAME=productionets-tvhl-69d96fc85b-p29vz
PATRONI_SCOPE=productionets
PATRONI_RESTAPI_LISTEN=0.0.0.0:8009
PATRONI_POSTGRESQL_DATA_DIR=/pgdata/productionets-tvhl
PATRONI_RESTAPI_CONNECT_ADDRESS=10.244.2.193:8009
PATRONI_LOG_LEVEL=INFO
PATRONI_KUBERNETES_LABELS={vendor: "crunchydata"}
PATRONI_KUBERNETES_SCOPE_LABEL=crunchy-pgha-scope
PATRONI_KUBERNETES_NAMESPACE=pgo
******************************
Patroni bootstrap method: existing_init
******************************
Patroni configuration file:
******************************
bootstrap:
  method: existing_init
  pgbackrest_init:
    command: '/opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-create-replica.sh
      primary'
    keep_existing_recovery_conf: true
  existing_init:
    command: '/opt/crunchy/bin/postgres-ha/bootstrap/create-from-existing.sh'
    keep_existing_recovery_conf: true
  dcs:
    postgresql:
      parameters:
        jit: off
        unix_socket_directories: /tmp
        wal_level: logical
        archive_mode: on
        archive_command: 'source /opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-set-env.sh
          && pgbackrest archive-push "%p"'
      use_slots: false
      recovery_conf:
        restore_command: 'source /opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-set-env.sh
          && pgbackrest archive-get %f "%p"'
  post_bootstrap: /opt/crunchy/bin/postgres-ha/bootstrap/post-bootstrap.sh
postgresql:
  use_unix_socket: true
  pgpass: /tmp/.pgpass
  create_replica_methods:
  - pgbackrest
  - basebackup
  pgbackrest:
    command: '/opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-create-replica.sh
      replica'
    keep_data: true
    no_params: true
  pgbackrest_standby:
    command: '/opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-create-replica.sh
      standby'
    keep_data: true
    no_params: true
    no_master: 1
  remove_data_directory_on_rewind_failure: true
  callbacks:
    on_role_change: /opt/crunchy/bin/postgres-ha/callbacks/pgha-on-role-change.sh
  pg_hba:
  - local all postgres peer
  - host replication primaryuser 0.0.0.0/0 md5
  - host all primaryuser 0.0.0.0/0 reject
  - host all all 0.0.0.0/0 md5
Thu Jun 23 09:50:30 UTC 2022 INFO: Applying SSHD..
Thu Jun 23 09:50:30 UTC 2022 INFO: nss_wrapper: ssh configured
Thu Jun 23 09:50:30 UTC 2022 INFO: Checking for SSH Host Keys in /sshd..
Thu Jun 23 09:50:30 UTC 2022 INFO: Checking for authorized_keys in /sshd
Thu Jun 23 09:50:30 UTC 2022 INFO: Checking for sshd_config in /sshd
Thu Jun 23 09:50:30 UTC 2022 INFO: Starting SSHD..
WARNING: 'UsePAM no' is not supported in Fedora and may cause several problems.
Thu Jun 23 09:50:30 UTC 2022 INFO: Starting background process to monitor Patroni initization and restart the database if needed
Thu Jun 23 09:50:30 UTC 2022 INFO: Now removing "pause" key from patroni.dynamic.json configuration file if present
sed: couldn't flush /pgdata/productionets-tvhl/sedaIEdzn: No space left on device
Thu Jun 23 09:50:30 UTC 2022 INFO: Initializing cluster bootstrap with command: '/usr/local/bin/patroni /tmp/postgres-ha-bootstrap.yaml'
Thu Jun 23 09:50:30 UTC 2022 INFO: Running Patroni as PID 1
2022-06-23 09:50:30,978 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-06-23 09:50:30,982 INFO: Reaped pid=146, exit status=0
2022-06-23 09:50:30,987 INFO: Reaped pid=149, exit status=0
2022-06-23 09:50:30,987 WARNING: Postgresql is not running.
2022-06-23 09:50:30,987 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:30,990 INFO: Reaped pid=150, exit status=0
2022-06-23 09:50:30,990 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7041905253012836517
  Database cluster state: shutting down
  pg_control last modified: Sun Jun 19 22:17:39 2022
  Latest checkpoint location: 2EC/96002538
  Latest checkpoint's REDO location: 2EC/960024C8
  Latest checkpoint's REDO WAL file: 00000005000002EC00000096
  Latest checkpoint's TimeLineID: 5
  Latest checkpoint's PrevTimeLineID: 5
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:1262502
  Latest checkpoint's NextOID: 158750
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 1262502
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sun Jun 19 22:15:20 2022
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 1000
  max_worker_processes setting: 8
  max_wal_senders setting: 6
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62

2022-06-23 09:50:31,032 INFO: doing crash recovery in a single user mode
2022-06-23 09:50:31,036 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
    json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:31,046 ERROR: Crash recovery finished with code=1
2022-06-23 09:50:31,046 INFO:  stdout=
2022-06-23 09:50:31,046 INFO:  stderr=2022-06-23 09:50:31.045 GMT [152] FATAL:  could not write lock file "postmaster.pid": No space left on device

2022-06-23 09:50:41,485 WARNING: Postgresql is not running.
2022-06-23 09:50:41,485 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:41,488 INFO: Reaped pid=193, exit status=0
2022-06-23 09:50:41,488 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7041905253012836517
  Database cluster state: shutting down
  pg_control last modified: Sun Jun 19 22:17:39 2022
  Latest checkpoint location: 2EC/96002538
  Latest checkpoint's REDO location: 2EC/960024C8
  Latest checkpoint's REDO WAL file: 00000005000002EC00000096
  Latest checkpoint's TimeLineID: 5
  Latest checkpoint's PrevTimeLineID: 5
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:1262502
  Latest checkpoint's NextOID: 158750
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 1262502
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sun Jun 19 22:15:20 2022
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 1000
  max_worker_processes setting: 8
  max_wal_senders setting: 6
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62

2022-06-23 09:50:41,489 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:41,495 INFO: Reaped pid=195, exit status=0
2022-06-23 09:50:41,495 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:41,496 INFO: starting as a secondary
2022-06-23 09:50:41,498 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
    json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:41,501 ERROR: Exception during execution of long running task restarting after failure
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/async_executor.py", line 97, in run
    wakeup = func(*args) if args else func()
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 871, in follow
    self.start(timeout=timeout, block_callbacks=change_role, role=role)
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 539, in start
    self.config.write_postgresql_conf(configuration)
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 419, in write_postgresql_conf
    self._sanitize_auto_conf()
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 240, in __exit__
    self._fd.close()
OSError: [Errno 28] No space left on device
/tmp:5432 - no response
2022-06-23 09:50:41,508 ERROR: unable to create backup copies of configuration files
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 366, in save_configuration_files
    shutil.copy(config_file, backup_file)
  File "/usr/lib64/python3.6/shutil.py", line 245, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib64/python3.6/shutil.py", line 122, in copyfile
    copyfileobj(fsrc, fdst)
  File "/usr/lib64/python3.6/shutil.py", line 82, in copyfileobj
    fdst.write(buf)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:41,509 WARNING: Postgresql is not running.
2022-06-23 09:50:41,509 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:41,511 INFO: Reaped pid=199, exit status=0
2022-06-23 09:50:41,512 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7041905253012836517
  Database cluster state: shutting down
  pg_control last modified: Sun Jun 19 22:17:39 2022
  Latest checkpoint location: 2EC/96002538
  Latest checkpoint's REDO location: 2EC/960024C8
  Latest checkpoint's REDO WAL file: 00000005000002EC00000096
  Latest checkpoint's TimeLineID: 5
  Latest checkpoint's PrevTimeLineID: 5
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:1262502
  Latest checkpoint's NextOID: 158750
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 1262502
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sun Jun 19 22:15:20 2022
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 1000
  max_worker_processes setting: 8
  max_wal_senders setting: 6
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62

2022-06-23 09:50:41,577 INFO: doing crash recovery in a single user mode
2022-06-23 09:50:41,580 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
    json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:41,590 ERROR: Crash recovery finished with code=1
2022-06-23 09:50:41,590 INFO:  stdout=
2022-06-23 09:50:41,590 INFO:  stderr=2022-06-23 09:50:41.589 GMT [201] FATAL:  could not write lock file "postmaster.pid": No space left on device

2022-06-23 09:50:52,003 WARNING: Postgresql is not running.
2022-06-23 09:50:52,003 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:52,007 INFO: Reaped pid=254, exit status=0
2022-06-23 09:50:52,007 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7041905253012836517
  Database cluster state: shutting down
  pg_control last modified: Sun Jun 19 22:17:39 2022
  Latest checkpoint location: 2EC/96002538
  Latest checkpoint's REDO location: 2EC/960024C8
  Latest checkpoint's REDO WAL file: 00000005000002EC00000096
  Latest checkpoint's TimeLineID: 5
  Latest checkpoint's PrevTimeLineID: 5
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:1262502
  Latest checkpoint's NextOID: 158750
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 1262502
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sun Jun 19 22:15:20 2022
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 1000
  max_worker_processes setting: 8
  max_wal_senders setting: 6
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62

2022-06-23 09:50:52,007 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:52,014 INFO: Reaped pid=256, exit status=0
2022-06-23 09:50:52,014 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:52,015 INFO: starting as a secondary
2022-06-23 09:50:52,016 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
    json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:52,020 ERROR: Exception during execution of long running task restarting after failure
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/async_executor.py", line 97, in run
    wakeup = func(*args) if args else func()
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 871, in follow
    self.start(timeout=timeout, block_callbacks=change_role, role=role)
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 539, in start
    self.config.write_postgresql_conf(configuration)
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 419, in write_postgresql_conf
    self._sanitize_auto_conf()
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 240, in __exit__
    self._fd.close()
OSError: [Errno 28] No space left on device
/tmp:5432 - no response
2022-06-23 09:50:52,025 ERROR: unable to create backup copies of configuration files
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 366, in save_configuration_files
    shutil.copy(config_file, backup_file)
  File "/usr/lib64/python3.6/shutil.py", line 245, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib64/python3.6/shutil.py", line 122, in copyfile
    copyfileobj(fsrc, fdst)
  File "/usr/lib64/python3.6/shutil.py", line 82, in copyfileobj
    fdst.write(buf)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:52,026 WARNING: Postgresql is not running.
2022-06-23 09:50:52,026 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:52,028 INFO: Reaped pid=260, exit status=0
2022-06-23 09:50:52,029 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7041905253012836517
  Database cluster state: shutting down
  pg_control last modified: Sun Jun 19 22:17:39 2022
  Latest checkpoint location: 2EC/96002538
  Latest checkpoint's REDO location: 2EC/960024C8
  Latest checkpoint's REDO WAL file: 00000005000002EC00000096
  Latest checkpoint's TimeLineID: 5
  Latest checkpoint's PrevTimeLineID: 5
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:1262502
  Latest checkpoint's NextOID: 158750
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 1262502
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sun Jun 19 22:15:20 2022
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 1000
  max_worker_processes setting: 8
  max_wal_senders setting: 6
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62

2022-06-23 09:50:52,029 INFO: doing crash recovery in a single user mode
2022-06-23 09:50:52,030 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
    json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:52,038 ERROR: Crash recovery finished with code=1
2022-06-23 09:50:52,039 INFO:  stdout=
2022-06-23 09:50:52,039 INFO:  stderr=2022-06-23 09:50:52.038 GMT [262] FATAL:  could not write lock file "postmaster.pid": No space left on device

2022-06-23 09:51:02,521 WARNING: Postgresql is not running.
2022-06-23 09:51:02,522 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:02,525 INFO: Reaped pid=312, exit status=0
2022-06-23 09:51:02,525 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7041905253012836517
  Database cluster state: shutting down
  pg_control last modified: Sun Jun 19 22:17:39 2022
  Latest checkpoint location: 2EC/96002538
  Latest checkpoint's REDO location: 2EC/960024C8
  Latest checkpoint's REDO WAL file: 00000005000002EC00000096
  Latest checkpoint's TimeLineID: 5
  Latest checkpoint's PrevTimeLineID: 5
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:1262502
  Latest checkpoint's NextOID: 158750
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 1262502
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sun Jun 19 22:15:20 2022
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 1000
  max_worker_processes setting: 8
  max_wal_senders setting: 6
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62

2022-06-23 09:51:02,525 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:02,532 INFO: Reaped pid=314, exit status=0
2022-06-23 09:51:02,532 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:02,533 INFO: starting as a secondary
2022-06-23 09:51:02,534 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
    json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:51:02,538 ERROR: Exception during execution of long running task restarting after failure
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/async_executor.py", line 97, in run
    wakeup = func(*args) if args else func()
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 871, in follow
    self.start(timeout=timeout, block_callbacks=change_role, role=role)
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 539, in start
    self.config.write_postgresql_conf(configuration)
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 419, in write_postgresql_conf
    self._sanitize_auto_conf()
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 240, in __exit__
    self._fd.close()
OSError: [Errno 28] No space left on device
/tmp:5432 - no response
2022-06-23 09:51:02,543 ERROR: unable to create backup copies of configuration files
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 366, in save_configuration_files
    shutil.copy(config_file, backup_file)
  File "/usr/lib64/python3.6/shutil.py", line 245, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib64/python3.6/shutil.py", line 122, in copyfile
    copyfileobj(fsrc, fdst)
  File "/usr/lib64/python3.6/shutil.py", line 82, in copyfileobj
    fdst.write(buf)
OSError: [Errno 28] No space left on device
2022-06-23 09:51:02,543 WARNING: Postgresql is not running.
2022-06-23 09:51:02,543 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:02,545 INFO: Reaped pid=318, exit status=0
2022-06-23 09:51:02,546 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7041905253012836517
  Database cluster state: shutting down
  pg_control last modified: Sun Jun 19 22:17:39 2022
  Latest checkpoint location: 2EC/96002538
  Latest checkpoint's REDO location: 2EC/960024C8
  Latest checkpoint's REDO WAL file: 00000005000002EC00000096
  Latest checkpoint's TimeLineID: 5
  Latest checkpoint's PrevTimeLineID: 5
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:1262502
  Latest checkpoint's NextOID: 158750
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 1262502
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sun Jun 19 22:15:20 2022
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 1000
  max_worker_processes setting: 8
  max_wal_senders setting: 6
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62

2022-06-23 09:51:02,546 INFO: doing crash recovery in a single user mode
2022-06-23 09:51:02,547 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
    json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:51:02,555 ERROR: Crash recovery finished with code=1
2022-06-23 09:51:02,555 INFO:  stdout=
2022-06-23 09:51:02,556 INFO:  stderr=2022-06-23 09:51:02.555 GMT [320] FATAL:  could not write lock file "postmaster.pid": No space left on device

2022-06-23 09:51:13,039 WARNING: Postgresql is not running.
2022-06-23 09:51:13,039 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:13,042 INFO: Reaped pid=415, exit status=0
2022-06-23 09:51:13,042 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7041905253012836517
  Database cluster state: shutting down
  pg_control last modified: Sun Jun 19 22:17:39 2022
  Latest checkpoint location: 2EC/96002538
  Latest checkpoint's REDO location: 2EC/960024C8
  Latest checkpoint's REDO WAL file: 00000005000002EC00000096
  Latest checkpoint's TimeLineID: 5
  Latest checkpoint's PrevTimeLineID: 5
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:1262502
  Latest checkpoint's NextOID: 158750
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 1262502
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sun Jun 19 22:15:20 2022
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 1000
  max_worker_processes setting: 8
  max_wal_senders setting: 6
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62

2022-06-23 09:51:13,043 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:13,049 INFO: Reaped pid=417, exit status=0
2022-06-23 09:51:13,049 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:13,049 INFO: starting as a secondary
2022-06-23 09:51:13,051 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
    json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:51:13,055 ERROR: Exception during execution of long running task restarting after failure
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/async_executor.py", line 97, in run
    wakeup = func(*args) if args else func()
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 871, in follow
    self.start(timeout=timeout, block_callbacks=change_role, role=role)
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 539, in start
    self.config.write_postgresql_conf(configuration)
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 419, in write_postgresql_conf
    self._sanitize_auto_conf()
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 240, in __exit__
    self._fd.close()
OSError: [Errno 28] No space left on device
/tmp:5432 - no response
2022-06-23 09:51:13,060 ERROR: unable to create backup copies of configuration files
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 366, in save_configuration_files
    shutil.copy(config_file, backup_file)
  File "/usr/lib64/python3.6/shutil.py", line 245, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib64/python3.6/shutil.py", line 122, in copyfile
    copyfileobj(fsrc, fdst)
  File "/usr/lib64/python3.6/shutil.py", line 82, in copyfileobj
    fdst.write(buf)
OSError: [Errno 28] No space left on device
2022-06-23 09:51:13,060 WARNING: Postgresql is not running.
2022-06-23 09:51:13,060 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:13,062 INFO: Reaped pid=421, exit status=0
2022-06-23 09:51:13,063 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7041905253012836517
  Database cluster state: shutting down
  pg_control last modified: Sun Jun 19 22:17:39 2022
  Latest checkpoint location: 2EC/96002538
  Latest checkpoint's REDO location: 2EC/960024C8
  Latest checkpoint's REDO WAL file: 00000005000002EC00000096
  Latest checkpoint's TimeLineID: 5
  Latest checkpoint's PrevTimeLineID: 5
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:1262502
  Latest checkpoint's NextOID: 158750
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 1262502
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sun Jun 19 22:15:20 2022
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 1000
  max_worker_processes setting: 8
  max_wal_senders setting: 6
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62

2022-06-23 09:51:13,063 INFO: doing crash recovery in a single user mode
2022-06-23 09:51:13,064 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
    json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:51:13,072 ERROR: Crash recovery finished with code=1
2022-06-23 09:51:13,073 INFO:  stdout=
2022-06-23 09:51:13,073 INFO:  stderr=2022-06-23 09:51:13.072 GMT [423] FATAL:  could not write lock file "postmaster.pid": No space left on device

arezk84 avatar Jun 23 '22 10:06 arezk84

I just had the same thing happen!? I just had my crunchy db database I spun up with the crunchy db operator with 10G attached storage run out of pvc storage space and I had to expand the pvc. I don't have very much data in any of the tables. Has anyone run into a fairly empty crunchy db postgresql database take up over 10G of data?

deanpeterson avatar Jun 23 '22 15:06 deanpeterson

I am currently facing a similiar issue. I have a very small database ( ~60MB) but it has already filled a 1GB volume. All of the data are WAL files. I don't know why there are so many of them. It is a very simple single node postgres database.

bash-4.4$ du -h -d 1
60M	./pg14
16K	./lost+found
1009M	./pg14_wal
9.4M	./pgbackrest
1.1G	.

lukeelten avatar Jul 22 '22 06:07 lukeelten

Well for what it's worth I have just got hit by this also.

pseymournutanix avatar Jul 28 '22 13:07 pseymournutanix

I fixed the problem by changing the following parameters:

  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          max_wal_size: 128MB
          wal_buffers: 2MB
          wal_recycle: off
          wal_init_zero: off

lukeelten avatar Jul 29 '22 14:07 lukeelten

Just my experiece so far on this as we also faced pg14_wal folder consuming all disk given to it, even though the DB has only 300mb

There are certain parameters that we tried to change and were unable to as the - or what looks like - the reconcile loop calls the patroni to change them back. For instance, we tried to change the wal_level from logical to replica, in 2 different ways:

  • we tried changing it in the patroni.dynamicConfiguration.postgresql.parameters in the CR, expecting the operator to take it and restart the cluster. That didn't happen.
  • We then tried doing it with the patronictl edit-config. If you leave pgo in debug mode, you can see that as soon as you edit it with patronictl, the pgo tells you that it got rolled back.

time="2022-08-12T19:38:15Z" level=debug msg="replaced configuration" file="internal/patroni/api.go:149" func=patroni.Executor.ReplaceConfiguration name=postgres namespace=XXX reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster stderr= stdout="--- \n+++ \n@@ -33,7 +33,7 @@\n unix_socket_directories: /tmp/postgres\n wal_buffers: 16MB\n wal_keep_size: 0MB\n**- wal_level: replica**\n**+ wal_level: logical**\n wal_log_hints: 'off'\n work_mem: 4MB\n pg_hba:\nConfiguration changed\n" version=5.1.1-0

After looking around, we ended up finding https://github.com/CrunchyData/postgres-operator/issues/3055 and https://github.com/CrunchyData/postgres-operator/issues/3002, but had to dig into the code to see that it is mandatory https://github.com/CrunchyData/postgres-operator/blob/2e18aef93dd2d6dee065ad00c959dc9fabc6da79/internal/postgres/parameters.go#L33-L38. Is it documented somewhere?

We also tried to change wal_log_hints, but it looks like it is also a prerequisite for something https://github.com/CrunchyData/postgres-operator/blob/7241a02ad4785fbafa7c1b61de9111c1c9030120/internal/patroni/config.go#L602

The same seems to be valid for wal_keep_size

Edit: few days after, turns out that our backup had issues and WAL files were not consumed then deleted, which was causing the issue. After the backup issue was fixed, 5 backup jobs failed with timeout to archive WAL, but at each run the job cleaned a lot of the WAL files. Last one was successful and pg14_wal went down from 35Gb to 17M

rmiguelac avatar Aug 12 '22 20:08 rmiguelac

We also hit this issue during import of a ~60GB large database. At its peak (looking at the Grafana dashboards) we found that the WAL log reached 80GB in size. The WAL might have been ended up more bloated than expected since we ended up running the import multiple times due to crashing when hitting "out of disk".

dbackeus avatar Oct 03 '22 15:10 dbackeus

this is also a problem in pgo v5. setting wal_level to replica is ignored, but should be able as mentioned here: https://github.com/CrunchyData/postgres-operator/issues/3055#issuecomment-1147947945

ThommyH avatar Oct 18 '22 13:10 ThommyH

I am not sure if I have understood your problem correctly. But you have to set pgbackrest retention management AND schedule at least one backup to get automatic archive retention management working.

  pgBackRestConfig:
    global:
      repo1-retention-full: "31"
      repo1-retention-full-type: time
    repos:
    - name: repo1
      volume:
        volumeClaimSpec:
          accessModes:
          - "ReadWriteOnce"
          resources:
            requests:
              storage: 10Gi
      schedules:
        full: "0 1 * * *"

mzwettler2 avatar Oct 24 '22 09:10 mzwettler2

Peanut gallery here. Couldn't some of the issues reported here be a side effect of not running a VACUUM pass often enough on the databases? Cheers.

Ref: https://www.postgresql.org/docs/current/routine-vacuuming.html

boldandbusted avatar Oct 27 '22 15:10 boldandbusted

Possibly related: https://github.com/CrunchyData/postgres-operator/issues/2531

mausch avatar Nov 27 '23 16:11 mausch

Considering the initial issue is for an older version of Crunchy Postgres for Kubernetes (v4.7), and the root cause for the various issues described in this thread are related to pgBackRest and/Postgres configuration & tuning (rather than anything with CPK itself), I am going to proceed with closing this.

If anyone is still running into similar issues, please feel free to submit a new GitHub issue. Or you can also continue to the conversation in the PGO project community discord server.

Thanks!

andrewlecuyer avatar Mar 06 '24 14:03 andrewlecuyer

the root cause for the various issues described in this thread are related to pgBackRest and/Postgres configuration & tuning (rather than anything with CPK itself)

Maybe the action here then is improving the docs around backups, perhaps adding this as a warning somewhere in a FAQ or something like that?

mausch avatar Mar 06 '24 15:03 mausch