postgres-operator
postgres-operator copied to clipboard
No space left on device
I am working on version 4.7.4, I am facing an issue trying to make the cluster back to work again.
When I checked the error I found No space left on device I tried to resize the pvc but nothing has changed and still, the pod is not ready
Environment
- Platform: (
Kubernetes
) - Platform Version: (
4.7.4
) - PGO Image Tag: (
centos8
) - Postgres Version (
13
) - Storage: (
oci
Oracle cloud)
here are the full logs, any help would be appreciated. Thanks
NWRAP_ERROR(1) - nwrap_files_cache_reload: Unable to open '/tmp/nss_wrapper/postgres/passwd' readonly -1:No such file or directory
NWRAP_ERROR(1) - nwrap_files_getpwuid: Error loading passwd file
nss_wrapper: user exists
nss_wrapper: group exists
nss_wrapper: environment configured
Thu Jun 23 09:50:30 UTC 2022 INFO: postgres-ha pre-bootstrap starting...
Thu Jun 23 09:50:30 UTC 2022 INFO: pgBackRest auto-config disabled
Thu Jun 23 09:50:30 UTC 2022 INFO: PGHA_PGBACKREST_LOCAL_S3_STORAGE, PGHA_PGBACKREST_LOCAL_GCS_STORAGE and PGHA_PGBACKREST_INITIALIZE will be ignored if provided
Thu Jun 23 09:50:30 UTC 2022 INFO: Defaults have been set for the following postgres-ha auto-configuration env vars: PGHA_DEFAULT_CONFIG, PGHA_BASE_BOOTSTRAP_CONFIG, PGHA_BASE_PG_CONFIG
Thu Jun 23 09:50:30 UTC 2022 INFO: Defaults have been set for the following postgres-ha env vars: PGHA_PATRONI_PORT
Thu Jun 23 09:50:30 UTC 2022 INFO: Defaults have been set for the following Patroni env vars: PATRONI_NAME, PATRONI_RESTAPI_LISTEN, PATRONI_RESTAPI_CONNECT_ADDRESS, PATRONI_POSTGRESQL_LISTEN, PATRONI_POSTGRESQL_CONNECT_ADDRESS
Thu Jun 23 09:50:30 UTC 2022 INFO: Setting postgres-ha configuration for database user credentials
Thu Jun 23 09:50:30 UTC 2022 INFO: Setting 'pguser' credentials using file system
Thu Jun 23 09:50:30 UTC 2022 INFO: Setting 'superuser' credentials using file system
Thu Jun 23 09:50:30 UTC 2022 INFO: Setting 'replicator' credentials using file system
Thu Jun 23 09:50:30 UTC 2022 INFO: Applying base bootstrap config to postgres-ha configuration
Thu Jun 23 09:50:30 UTC 2022 INFO: Applying base postgres config to postgres-ha configuration
Thu Jun 23 09:50:30 UTC 2022 INFO: Applying pgbackrest config to postgres-ha configuration
Thu Jun 23 09:50:30 UTC 2022 INFO: Applying standard (non-TLS) remote connection configuration to pg_hba.conf
Thu Jun 23 09:50:30 UTC 2022 INFO: Custom postgres-ha configuration file not detected
Thu Jun 23 09:50:30 UTC 2022 INFO: Finished building postgres-ha configuration file '/tmp/postgres-ha-bootstrap.yaml'
Thu Jun 23 09:50:30 UTC 2022 INFO: postgres-ha pre-bootstrap complete! The following configuration will be utilized to initialize
******************************
postgres-ha (PGHA) env vars:
******************************
PGHA_BASE_PG_CONFIG=true
PGHA_PATRONI_PORT=8009
PGHA_PG_PORT=5432
PGHA_PGBACKREST_LOCAL_S3_STORAGE=false
PGHA_SYNC_REPLICATION=false
PGHA_USER=postgres
PGHA_DEFAULT_CONFIG=true
PGHA_PASSWORD_TYPE=
PGHA_REPLICA_REINIT_ON_START_FAIL=true
PGHA_PGBACKREST=true
PGHA_BASE_BOOTSTRAP_CONFIG=true
PGHA_STANDBY=false
PGHA_PGBACKREST_LOCAL_GCS_STORAGE=false
PGHA_TLS_ONLY=false
PGHA_TLS_ENABLED=false
PGHA_DATABASE=productionets
******************************
Patroni env vars:
******************************
PATRONI_POSTGRESQL_CONNECT_ADDRESS=10.244.2.193:5432
PATRONI_POSTGRESQL_LISTEN=0.0.0.0:5432
PATRONI_NAME=productionets-tvhl-69d96fc85b-p29vz
PATRONI_SCOPE=productionets
PATRONI_RESTAPI_LISTEN=0.0.0.0:8009
PATRONI_POSTGRESQL_DATA_DIR=/pgdata/productionets-tvhl
PATRONI_RESTAPI_CONNECT_ADDRESS=10.244.2.193:8009
PATRONI_LOG_LEVEL=INFO
PATRONI_KUBERNETES_LABELS={vendor: "crunchydata"}
PATRONI_KUBERNETES_SCOPE_LABEL=crunchy-pgha-scope
PATRONI_KUBERNETES_NAMESPACE=pgo
******************************
Patroni bootstrap method: existing_init
******************************
Patroni configuration file:
******************************
bootstrap:
method: existing_init
pgbackrest_init:
command: '/opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-create-replica.sh
primary'
keep_existing_recovery_conf: true
existing_init:
command: '/opt/crunchy/bin/postgres-ha/bootstrap/create-from-existing.sh'
keep_existing_recovery_conf: true
dcs:
postgresql:
parameters:
jit: off
unix_socket_directories: /tmp
wal_level: logical
archive_mode: on
archive_command: 'source /opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-set-env.sh
&& pgbackrest archive-push "%p"'
use_slots: false
recovery_conf:
restore_command: 'source /opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-set-env.sh
&& pgbackrest archive-get %f "%p"'
post_bootstrap: /opt/crunchy/bin/postgres-ha/bootstrap/post-bootstrap.sh
postgresql:
use_unix_socket: true
pgpass: /tmp/.pgpass
create_replica_methods:
- pgbackrest
- basebackup
pgbackrest:
command: '/opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-create-replica.sh
replica'
keep_data: true
no_params: true
pgbackrest_standby:
command: '/opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-create-replica.sh
standby'
keep_data: true
no_params: true
no_master: 1
remove_data_directory_on_rewind_failure: true
callbacks:
on_role_change: /opt/crunchy/bin/postgres-ha/callbacks/pgha-on-role-change.sh
pg_hba:
- local all postgres peer
- host replication primaryuser 0.0.0.0/0 md5
- host all primaryuser 0.0.0.0/0 reject
- host all all 0.0.0.0/0 md5
Thu Jun 23 09:50:30 UTC 2022 INFO: Applying SSHD..
Thu Jun 23 09:50:30 UTC 2022 INFO: nss_wrapper: ssh configured
Thu Jun 23 09:50:30 UTC 2022 INFO: Checking for SSH Host Keys in /sshd..
Thu Jun 23 09:50:30 UTC 2022 INFO: Checking for authorized_keys in /sshd
Thu Jun 23 09:50:30 UTC 2022 INFO: Checking for sshd_config in /sshd
Thu Jun 23 09:50:30 UTC 2022 INFO: Starting SSHD..
WARNING: 'UsePAM no' is not supported in Fedora and may cause several problems.
Thu Jun 23 09:50:30 UTC 2022 INFO: Starting background process to monitor Patroni initization and restart the database if needed
Thu Jun 23 09:50:30 UTC 2022 INFO: Now removing "pause" key from patroni.dynamic.json configuration file if present
sed: couldn't flush /pgdata/productionets-tvhl/sedaIEdzn: No space left on device
Thu Jun 23 09:50:30 UTC 2022 INFO: Initializing cluster bootstrap with command: '/usr/local/bin/patroni /tmp/postgres-ha-bootstrap.yaml'
Thu Jun 23 09:50:30 UTC 2022 INFO: Running Patroni as PID 1
2022-06-23 09:50:30,978 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-06-23 09:50:30,982 INFO: Reaped pid=146, exit status=0
2022-06-23 09:50:30,987 INFO: Reaped pid=149, exit status=0
2022-06-23 09:50:30,987 WARNING: Postgresql is not running.
2022-06-23 09:50:30,987 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:30,990 INFO: Reaped pid=150, exit status=0
2022-06-23 09:50:30,990 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202007201
Database system identifier: 7041905253012836517
Database cluster state: shutting down
pg_control last modified: Sun Jun 19 22:17:39 2022
Latest checkpoint location: 2EC/96002538
Latest checkpoint's REDO location: 2EC/960024C8
Latest checkpoint's REDO WAL file: 00000005000002EC00000096
Latest checkpoint's TimeLineID: 5
Latest checkpoint's PrevTimeLineID: 5
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1262502
Latest checkpoint's NextOID: 158750
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 479
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 1262502
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Sun Jun 19 22:15:20 2022
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: on
max_connections setting: 1000
max_worker_processes setting: 8
max_wal_senders setting: 6
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62
2022-06-23 09:50:31,032 INFO: doing crash recovery in a single user mode
2022-06-23 09:50:31,036 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:31,046 ERROR: Crash recovery finished with code=1
2022-06-23 09:50:31,046 INFO: stdout=
2022-06-23 09:50:31,046 INFO: stderr=2022-06-23 09:50:31.045 GMT [152] FATAL: could not write lock file "postmaster.pid": No space left on device
2022-06-23 09:50:41,485 WARNING: Postgresql is not running.
2022-06-23 09:50:41,485 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:41,488 INFO: Reaped pid=193, exit status=0
2022-06-23 09:50:41,488 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202007201
Database system identifier: 7041905253012836517
Database cluster state: shutting down
pg_control last modified: Sun Jun 19 22:17:39 2022
Latest checkpoint location: 2EC/96002538
Latest checkpoint's REDO location: 2EC/960024C8
Latest checkpoint's REDO WAL file: 00000005000002EC00000096
Latest checkpoint's TimeLineID: 5
Latest checkpoint's PrevTimeLineID: 5
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1262502
Latest checkpoint's NextOID: 158750
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 479
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 1262502
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Sun Jun 19 22:15:20 2022
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: on
max_connections setting: 1000
max_worker_processes setting: 8
max_wal_senders setting: 6
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62
2022-06-23 09:50:41,489 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:41,495 INFO: Reaped pid=195, exit status=0
2022-06-23 09:50:41,495 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:41,496 INFO: starting as a secondary
2022-06-23 09:50:41,498 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:41,501 ERROR: Exception during execution of long running task restarting after failure
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/async_executor.py", line 97, in run
wakeup = func(*args) if args else func()
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 871, in follow
self.start(timeout=timeout, block_callbacks=change_role, role=role)
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 539, in start
self.config.write_postgresql_conf(configuration)
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 419, in write_postgresql_conf
self._sanitize_auto_conf()
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 240, in __exit__
self._fd.close()
OSError: [Errno 28] No space left on device
/tmp:5432 - no response
2022-06-23 09:50:41,508 ERROR: unable to create backup copies of configuration files
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 366, in save_configuration_files
shutil.copy(config_file, backup_file)
File "/usr/lib64/python3.6/shutil.py", line 245, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/usr/lib64/python3.6/shutil.py", line 122, in copyfile
copyfileobj(fsrc, fdst)
File "/usr/lib64/python3.6/shutil.py", line 82, in copyfileobj
fdst.write(buf)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:41,509 WARNING: Postgresql is not running.
2022-06-23 09:50:41,509 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:41,511 INFO: Reaped pid=199, exit status=0
2022-06-23 09:50:41,512 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202007201
Database system identifier: 7041905253012836517
Database cluster state: shutting down
pg_control last modified: Sun Jun 19 22:17:39 2022
Latest checkpoint location: 2EC/96002538
Latest checkpoint's REDO location: 2EC/960024C8
Latest checkpoint's REDO WAL file: 00000005000002EC00000096
Latest checkpoint's TimeLineID: 5
Latest checkpoint's PrevTimeLineID: 5
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1262502
Latest checkpoint's NextOID: 158750
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 479
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 1262502
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Sun Jun 19 22:15:20 2022
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: on
max_connections setting: 1000
max_worker_processes setting: 8
max_wal_senders setting: 6
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62
2022-06-23 09:50:41,577 INFO: doing crash recovery in a single user mode
2022-06-23 09:50:41,580 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:41,590 ERROR: Crash recovery finished with code=1
2022-06-23 09:50:41,590 INFO: stdout=
2022-06-23 09:50:41,590 INFO: stderr=2022-06-23 09:50:41.589 GMT [201] FATAL: could not write lock file "postmaster.pid": No space left on device
2022-06-23 09:50:52,003 WARNING: Postgresql is not running.
2022-06-23 09:50:52,003 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:52,007 INFO: Reaped pid=254, exit status=0
2022-06-23 09:50:52,007 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202007201
Database system identifier: 7041905253012836517
Database cluster state: shutting down
pg_control last modified: Sun Jun 19 22:17:39 2022
Latest checkpoint location: 2EC/96002538
Latest checkpoint's REDO location: 2EC/960024C8
Latest checkpoint's REDO WAL file: 00000005000002EC00000096
Latest checkpoint's TimeLineID: 5
Latest checkpoint's PrevTimeLineID: 5
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1262502
Latest checkpoint's NextOID: 158750
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 479
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 1262502
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Sun Jun 19 22:15:20 2022
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: on
max_connections setting: 1000
max_worker_processes setting: 8
max_wal_senders setting: 6
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62
2022-06-23 09:50:52,007 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:52,014 INFO: Reaped pid=256, exit status=0
2022-06-23 09:50:52,014 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:52,015 INFO: starting as a secondary
2022-06-23 09:50:52,016 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:52,020 ERROR: Exception during execution of long running task restarting after failure
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/async_executor.py", line 97, in run
wakeup = func(*args) if args else func()
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 871, in follow
self.start(timeout=timeout, block_callbacks=change_role, role=role)
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 539, in start
self.config.write_postgresql_conf(configuration)
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 419, in write_postgresql_conf
self._sanitize_auto_conf()
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 240, in __exit__
self._fd.close()
OSError: [Errno 28] No space left on device
/tmp:5432 - no response
2022-06-23 09:50:52,025 ERROR: unable to create backup copies of configuration files
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 366, in save_configuration_files
shutil.copy(config_file, backup_file)
File "/usr/lib64/python3.6/shutil.py", line 245, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/usr/lib64/python3.6/shutil.py", line 122, in copyfile
copyfileobj(fsrc, fdst)
File "/usr/lib64/python3.6/shutil.py", line 82, in copyfileobj
fdst.write(buf)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:52,026 WARNING: Postgresql is not running.
2022-06-23 09:50:52,026 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:50:52,028 INFO: Reaped pid=260, exit status=0
2022-06-23 09:50:52,029 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202007201
Database system identifier: 7041905253012836517
Database cluster state: shutting down
pg_control last modified: Sun Jun 19 22:17:39 2022
Latest checkpoint location: 2EC/96002538
Latest checkpoint's REDO location: 2EC/960024C8
Latest checkpoint's REDO WAL file: 00000005000002EC00000096
Latest checkpoint's TimeLineID: 5
Latest checkpoint's PrevTimeLineID: 5
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1262502
Latest checkpoint's NextOID: 158750
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 479
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 1262502
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Sun Jun 19 22:15:20 2022
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: on
max_connections setting: 1000
max_worker_processes setting: 8
max_wal_senders setting: 6
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62
2022-06-23 09:50:52,029 INFO: doing crash recovery in a single user mode
2022-06-23 09:50:52,030 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:50:52,038 ERROR: Crash recovery finished with code=1
2022-06-23 09:50:52,039 INFO: stdout=
2022-06-23 09:50:52,039 INFO: stderr=2022-06-23 09:50:52.038 GMT [262] FATAL: could not write lock file "postmaster.pid": No space left on device
2022-06-23 09:51:02,521 WARNING: Postgresql is not running.
2022-06-23 09:51:02,522 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:02,525 INFO: Reaped pid=312, exit status=0
2022-06-23 09:51:02,525 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202007201
Database system identifier: 7041905253012836517
Database cluster state: shutting down
pg_control last modified: Sun Jun 19 22:17:39 2022
Latest checkpoint location: 2EC/96002538
Latest checkpoint's REDO location: 2EC/960024C8
Latest checkpoint's REDO WAL file: 00000005000002EC00000096
Latest checkpoint's TimeLineID: 5
Latest checkpoint's PrevTimeLineID: 5
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1262502
Latest checkpoint's NextOID: 158750
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 479
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 1262502
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Sun Jun 19 22:15:20 2022
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: on
max_connections setting: 1000
max_worker_processes setting: 8
max_wal_senders setting: 6
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62
2022-06-23 09:51:02,525 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:02,532 INFO: Reaped pid=314, exit status=0
2022-06-23 09:51:02,532 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:02,533 INFO: starting as a secondary
2022-06-23 09:51:02,534 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:51:02,538 ERROR: Exception during execution of long running task restarting after failure
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/async_executor.py", line 97, in run
wakeup = func(*args) if args else func()
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 871, in follow
self.start(timeout=timeout, block_callbacks=change_role, role=role)
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 539, in start
self.config.write_postgresql_conf(configuration)
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 419, in write_postgresql_conf
self._sanitize_auto_conf()
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 240, in __exit__
self._fd.close()
OSError: [Errno 28] No space left on device
/tmp:5432 - no response
2022-06-23 09:51:02,543 ERROR: unable to create backup copies of configuration files
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 366, in save_configuration_files
shutil.copy(config_file, backup_file)
File "/usr/lib64/python3.6/shutil.py", line 245, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/usr/lib64/python3.6/shutil.py", line 122, in copyfile
copyfileobj(fsrc, fdst)
File "/usr/lib64/python3.6/shutil.py", line 82, in copyfileobj
fdst.write(buf)
OSError: [Errno 28] No space left on device
2022-06-23 09:51:02,543 WARNING: Postgresql is not running.
2022-06-23 09:51:02,543 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:02,545 INFO: Reaped pid=318, exit status=0
2022-06-23 09:51:02,546 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202007201
Database system identifier: 7041905253012836517
Database cluster state: shutting down
pg_control last modified: Sun Jun 19 22:17:39 2022
Latest checkpoint location: 2EC/96002538
Latest checkpoint's REDO location: 2EC/960024C8
Latest checkpoint's REDO WAL file: 00000005000002EC00000096
Latest checkpoint's TimeLineID: 5
Latest checkpoint's PrevTimeLineID: 5
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1262502
Latest checkpoint's NextOID: 158750
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 479
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 1262502
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Sun Jun 19 22:15:20 2022
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: on
max_connections setting: 1000
max_worker_processes setting: 8
max_wal_senders setting: 6
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62
2022-06-23 09:51:02,546 INFO: doing crash recovery in a single user mode
2022-06-23 09:51:02,547 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:51:02,555 ERROR: Crash recovery finished with code=1
2022-06-23 09:51:02,555 INFO: stdout=
2022-06-23 09:51:02,556 INFO: stderr=2022-06-23 09:51:02.555 GMT [320] FATAL: could not write lock file "postmaster.pid": No space left on device
2022-06-23 09:51:13,039 WARNING: Postgresql is not running.
2022-06-23 09:51:13,039 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:13,042 INFO: Reaped pid=415, exit status=0
2022-06-23 09:51:13,042 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202007201
Database system identifier: 7041905253012836517
Database cluster state: shutting down
pg_control last modified: Sun Jun 19 22:17:39 2022
Latest checkpoint location: 2EC/96002538
Latest checkpoint's REDO location: 2EC/960024C8
Latest checkpoint's REDO WAL file: 00000005000002EC00000096
Latest checkpoint's TimeLineID: 5
Latest checkpoint's PrevTimeLineID: 5
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1262502
Latest checkpoint's NextOID: 158750
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 479
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 1262502
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Sun Jun 19 22:15:20 2022
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: on
max_connections setting: 1000
max_worker_processes setting: 8
max_wal_senders setting: 6
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62
2022-06-23 09:51:13,043 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:13,049 INFO: Reaped pid=417, exit status=0
2022-06-23 09:51:13,049 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:13,049 INFO: starting as a secondary
2022-06-23 09:51:13,051 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:51:13,055 ERROR: Exception during execution of long running task restarting after failure
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/async_executor.py", line 97, in run
wakeup = func(*args) if args else func()
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 871, in follow
self.start(timeout=timeout, block_callbacks=change_role, role=role)
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 539, in start
self.config.write_postgresql_conf(configuration)
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 419, in write_postgresql_conf
self._sanitize_auto_conf()
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 240, in __exit__
self._fd.close()
OSError: [Errno 28] No space left on device
/tmp:5432 - no response
2022-06-23 09:51:13,060 ERROR: unable to create backup copies of configuration files
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/config.py", line 366, in save_configuration_files
shutil.copy(config_file, backup_file)
File "/usr/lib64/python3.6/shutil.py", line 245, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/usr/lib64/python3.6/shutil.py", line 122, in copyfile
copyfileobj(fsrc, fdst)
File "/usr/lib64/python3.6/shutil.py", line 82, in copyfileobj
fdst.write(buf)
OSError: [Errno 28] No space left on device
2022-06-23 09:51:13,060 WARNING: Postgresql is not running.
2022-06-23 09:51:13,060 INFO: Lock owner: None; I am productionets-tvhl-69d96fc85b-p29vz
2022-06-23 09:51:13,062 INFO: Reaped pid=421, exit status=0
2022-06-23 09:51:13,063 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202007201
Database system identifier: 7041905253012836517
Database cluster state: shutting down
pg_control last modified: Sun Jun 19 22:17:39 2022
Latest checkpoint location: 2EC/96002538
Latest checkpoint's REDO location: 2EC/960024C8
Latest checkpoint's REDO WAL file: 00000005000002EC00000096
Latest checkpoint's TimeLineID: 5
Latest checkpoint's PrevTimeLineID: 5
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1262502
Latest checkpoint's NextOID: 158750
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 479
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 1262502
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Sun Jun 19 22:15:20 2022
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: on
max_connections setting: 1000
max_worker_processes setting: 8
max_wal_senders setting: 6
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: 25f3f1c8c3224e1eb1227873333dc419daf100535cea290a3bb9fa5de81b8e62
2022-06-23 09:51:13,063 INFO: doing crash recovery in a single user mode
2022-06-23 09:51:13,064 ERROR: Exception when saving file: /pgdata/productionets-tvhl/patroni.dynamic.json
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 167, in save_cache
json.dump(self.dynamic_configuration, f)
OSError: [Errno 28] No space left on device
2022-06-23 09:51:13,072 ERROR: Crash recovery finished with code=1
2022-06-23 09:51:13,073 INFO: stdout=
2022-06-23 09:51:13,073 INFO: stderr=2022-06-23 09:51:13.072 GMT [423] FATAL: could not write lock file "postmaster.pid": No space left on device
I just had the same thing happen!? I just had my crunchy db database I spun up with the crunchy db operator with 10G attached storage run out of pvc storage space and I had to expand the pvc. I don't have very much data in any of the tables. Has anyone run into a fairly empty crunchy db postgresql database take up over 10G of data?
I am currently facing a similiar issue. I have a very small database ( ~60MB) but it has already filled a 1GB volume. All of the data are WAL files. I don't know why there are so many of them. It is a very simple single node postgres database.
bash-4.4$ du -h -d 1
60M ./pg14
16K ./lost+found
1009M ./pg14_wal
9.4M ./pgbackrest
1.1G .
Well for what it's worth I have just got hit by this also.
I fixed the problem by changing the following parameters:
patroni:
dynamicConfiguration:
postgresql:
parameters:
max_wal_size: 128MB
wal_buffers: 2MB
wal_recycle: off
wal_init_zero: off
Just my experiece so far on this as we also faced pg14_wal folder consuming all disk given to it, even though the DB has only 300mb
There are certain parameters that we tried to change and were unable to as the - or what looks like - the reconcile loop calls the patroni to change them back. For instance, we tried to change the wal_level from logical to replica, in 2 different ways:
- we tried changing it in the patroni.dynamicConfiguration.postgresql.parameters in the CR, expecting the operator to take it and restart the cluster. That didn't happen.
- We then tried doing it with the patronictl edit-config. If you leave pgo in debug mode, you can see that as soon as you edit it with patronictl, the pgo tells you that it got rolled back.
time="2022-08-12T19:38:15Z" level=debug msg="replaced configuration" file="internal/patroni/api.go:149" func=patroni.Executor.ReplaceConfiguration name=postgres namespace=XXX reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster stderr= stdout="--- \n+++ \n@@ -33,7 +33,7 @@\n unix_socket_directories: /tmp/postgres\n wal_buffers: 16MB\n wal_keep_size: 0MB\n**- wal_level: replica**\n**+ wal_level: logical**\n wal_log_hints: 'off'\n work_mem: 4MB\n pg_hba:\nConfiguration changed\n" version=5.1.1-0
After looking around, we ended up finding https://github.com/CrunchyData/postgres-operator/issues/3055 and https://github.com/CrunchyData/postgres-operator/issues/3002, but had to dig into the code to see that it is mandatory https://github.com/CrunchyData/postgres-operator/blob/2e18aef93dd2d6dee065ad00c959dc9fabc6da79/internal/postgres/parameters.go#L33-L38. Is it documented somewhere?
We also tried to change wal_log_hints, but it looks like it is also a prerequisite for something https://github.com/CrunchyData/postgres-operator/blob/7241a02ad4785fbafa7c1b61de9111c1c9030120/internal/patroni/config.go#L602
The same seems to be valid for wal_keep_size
Edit: few days after, turns out that our backup had issues and WAL files were not consumed then deleted, which was causing the issue. After the backup issue was fixed, 5 backup jobs failed with timeout to archive WAL, but at each run the job cleaned a lot of the WAL files. Last one was successful and pg14_wal went down from 35Gb to 17M
We also hit this issue during import of a ~60GB large database. At its peak (looking at the Grafana dashboards) we found that the WAL log reached 80GB in size. The WAL might have been ended up more bloated than expected since we ended up running the import multiple times due to crashing when hitting "out of disk".
this is also a problem in pgo v5. setting wal_level to replica is ignored, but should be able as mentioned here: https://github.com/CrunchyData/postgres-operator/issues/3055#issuecomment-1147947945
I am not sure if I have understood your problem correctly. But you have to set pgbackrest retention management AND schedule at least one backup to get automatic archive retention management working.
pgBackRestConfig:
global:
repo1-retention-full: "31"
repo1-retention-full-type: time
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 10Gi
schedules:
full: "0 1 * * *"
Peanut gallery here. Couldn't some of the issues reported here be a side effect of not running a VACUUM pass often enough on the databases? Cheers.
Ref: https://www.postgresql.org/docs/current/routine-vacuuming.html
Possibly related: https://github.com/CrunchyData/postgres-operator/issues/2531
Considering the initial issue is for an older version of Crunchy Postgres for Kubernetes (v4.7), and the root cause for the various issues described in this thread are related to pgBackRest and/Postgres configuration & tuning (rather than anything with CPK itself), I am going to proceed with closing this.
If anyone is still running into similar issues, please feel free to submit a new GitHub issue. Or you can also continue to the conversation in the PGO project community discord server.
Thanks!
the root cause for the various issues described in this thread are related to pgBackRest and/Postgres configuration & tuning (rather than anything with CPK itself)
Maybe the action here then is improving the docs around backups, perhaps adding this as a warning somewhere in a FAQ or something like that?