[BUG] Chaos-Mesh IO latency fault causes pg startup errors
Describe the bug A clear and concise description of what the bug is.
kbcli version
Kubernetes: v1.30.4-vke.4
KubeBlocks: 0.9.5-beta.8
kbcli: 0.9.5-beta.7
To Reproduce Steps to reproduce the behavior:
- create pg cluster
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
name: postgres-fuhcpe
namespace: default
spec:
clusterDefinitionRef: postgresql
topology: replication
terminationPolicy: Delete
componentSpecs:
- name: postgresql
serviceVersion: 12.14.0
replicas: 2
resources:
requests:
cpu: 100m
memory: 0.5Gi
limits:
cpu: 100m
memory: 0.5Gi
volumeClaimTemplates:
- name: data
spec:
storageClassName:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
kbcli cluster list postgres-fuhcpe --show-labels --namespace default
NAME NAMESPACE CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS CREATED-TIME LABELS
postgres-fuhcpe default postgresql Delete Running Jul 31,2025 17:09 UTC+0800 clusterdefinition.kubeblocks.io/name=postgresql,clusterversion.kubeblocks.io/name=
kbcli cluster list-instances postgres-fuhcpe --namespace default
NAME NAMESPACE CLUSTER COMPONENT STATUS ROLE ACCESSMODE AZ CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE NODE CREATED-TIME
postgres-fuhcpe-postgresql-0 default postgres-fuhcpe postgresql Running secondary <none> cn-guangzhou-b 100m / 100m 512Mi / 512Mi data:20Gi 192.168.0.88/192.168.0.88 Jul 31,2025 17:09 UTC+0800
postgres-fuhcpe-postgresql-1 default postgres-fuhcpe postgresql Running primary <none> cn-guangzhou-b 100m / 100m 512Mi / 512Mi data:20Gi 192.168.0.167/192.168.0.167 Jul 31,2025 17:09 UTC+0800
- install chaos-mesh
helm repo add chaos-mesh https://charts.chaos-mesh.org/
helm upgrade --install chaos-mesh chaos-mesh/chaos-mesh -n chaos-mesh --create-namespace --version 2.7.2 --set chaosDaemon.runtime=containerd --set chaosDaemon.socketPath=/run/containerd/containerd.sock
- inject IO latency fault
apiVersion: chaos-mesh.org/v1alpha1
kind: IOChaos
metadata:
name: test-chaos-mesh-latencyover-postgres-fuhcpe
namespace: default
spec:
action: latency
mode: all
selector:
namespaces:
- default
labelSelectors:
apps.kubeblocks.io/pod-name: postgres-fuhcpe-postgresql-1
volumePath: /home/postgres/pgdata
path: '/home/postgres/pgdata/**/*'
delay: '2000ms'
percent: 100
duration: 2m
- See error
kubectl get cluster postgres-fuhcpe
NAME CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS AGE
postgres-fuhcpe postgresql Delete Updating 33m
kbcli cluster list-instances postgres-fuhcpe
NAME NAMESPACE CLUSTER COMPONENT STATUS ROLE ACCESSMODE AZ CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE NODE CREATED-TIME
postgres-fuhcpe-postgresql-0 default postgres-fuhcpe postgresql Running primary <none> cn-guangzhou-b 100m / 100m 512Mi / 512Mi data:20Gi 192.168.0.88/192.168.0.88 Jul 31,2025 17:09 UTC+0800
postgres-fuhcpe-postgresql-1 default postgres-fuhcpe postgresql Running <none> <none> cn-guangzhou-b 100m / 100m 512Mi / 512Mi data:20Gi 192.168.0.167/192.168.0.167 Jul 31,2025 17:09 UTC+0800
logs error pod
kubectl logs postgres-fuhcpe-postgresql-1
Defaulted container "postgresql" out of: postgresql, pgbouncer, exporter, lorry, config-manager, pg-init-container (init), init-lorry (init)
2025-07-31 09:10:42,005 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2025-07-31 09:10:44,008 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2025-07-31 09:10:44,008 - bootstrapping - INFO - No meta-data available for this provider
2025-07-31 09:10:44,009 - bootstrapping - INFO - Looks like you are running local
2025-07-31 09:10:44,099 - bootstrapping - INFO - kubeblocks generate local configuration:
bootstrap:
dcs:
check_timeline: true
loop_wait: 10
max_timelines_history: 0
maximum_lag_on_failover: 1048576
postgresql:
parameters:
archive_command: /bin/true
archive_mode: 'on'
autovacuum_analyze_scale_factor: '0.1'
autovacuum_max_workers: '3'
autovacuum_vacuum_scale_factor: '0.05'
checkpoint_completion_target: '0.9'
log_autovacuum_min_duration: '10000'
log_checkpoints: 'True'
log_connections: 'False'
log_disconnections: 'False'
log_min_duration_statement: '1000'
log_statement: ddl
log_temp_files: 128kB
max_connections: '56'
max_locks_per_transaction: '64'
max_prepared_transactions: '100'
max_replication_slots: '16'
max_wal_senders: '64'
max_worker_processes: '8'
tcp_keepalives_idle: 45s
tcp_keepalives_interval: 10s
track_commit_timestamp: 'False'
track_functions: pl
wal_compression: 'True'
wal_keep_segments: '0'
wal_level: replica
wal_log_hints: 'False'
retry_timeout: 10
ttl: 30
initdb:
- auth-host: md5
- auth-local: trust
postgresql:
config_dir: /home/postgres/pgdata/conf
custom_conf: /home/postgres/conf/postgresql.conf
parameters:
pg_stat_statements.track_utility: 'False'
shared_buffers: 128MB
shared_preload_libraries: pg_stat_statements,auto_explain,bg_mon,pgextwlist,pg_auth_mon,set_user,pg_cron,pg_stat_kcache,timescaledb,pgaudit
pg_hba:
- host all all 0.0.0.0/0 md5
- host all all ::/0 md5
- local all all trust
- host all all 127.0.0.1/32 trust
- host all all ::1/128 trust
- local replication all trust
- host replication all 0.0.0.0/0 md5
- host replication all ::/0 md5
2025-07-31 09:10:44,200 - bootstrapping - INFO - Configuring pam-oauth2
2025-07-31 09:10:44,200 - bootstrapping - INFO - No PAM_OAUTH2 configuration was specified, skipping
2025-07-31 09:10:44,200 - bootstrapping - INFO - Configuring patroni
2025-07-31 09:10:44,297 - bootstrapping - INFO - Writing to file /run/postgres.yml
2025-07-31 09:10:44,298 - bootstrapping - INFO - Configuring wal-e
2025-07-31 09:10:44,298 - bootstrapping - INFO - Configuring certificate
2025-07-31 09:10:44,298 - bootstrapping - INFO - Generating ssl self-signed certificate
2025-07-31 09:10:47,799 - bootstrapping - INFO - Configuring pgqd
2025-07-31 09:10:47,799 - bootstrapping - INFO - Configuring standby-cluster
2025-07-31 09:10:47,799 - bootstrapping - INFO - Configuring log
2025-07-31 09:10:47,799 - bootstrapping - INFO - Configuring pgbouncer
2025-07-31 09:10:47,799 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2025-07-31 09:10:47,799 - bootstrapping - INFO - Configuring bootstrap
2025-07-31 09:10:47,799 - bootstrapping - INFO - Configuring crontab
2025-07-31 09:10:47,799 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2025-07-31 09:10:50,106 INFO: Selected new K8s API server endpoint https://192.168.0.2:6443
2025-07-31 09:10:50,210 INFO: No PostgreSQL configuration items changed, nothing to reload.
2025-07-31 09:10:50,211 INFO: Lock owner: None; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:10:50,341 INFO: trying to bootstrap a new cluster
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /home/postgres/pgdata/pgroot/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
/usr/lib/postgresql/12/bin/pg_ctl -D /home/postgres/pgdata/pgroot/data -l logfile start
2025-07-31 09:10:55,898 INFO: postmaster pid=133
/var/run/postgresql:5432 - no response
2025-07-31 09:10:55 GMT [133]: [1-1] 688b331f.85 0 LOG: Auto detecting pg_stat_kcache.linux_hz parameter...
2025-07-31 09:10:55 GMT [133]: [2-1] 688b331f.85 0 LOG: pg_stat_kcache.linux_hz is set to 1000000
2025-07-31 09:10:55 GMT [133]: [3-1] 688b331f.85 0 LOG: pgaudit extension initialized
2025-07-31 09:10:55 GMT [133]: [4-1] 688b331f.85 0 LOG: starting PostgreSQL 12.18 (Ubuntu 12.18-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2025-07-31 09:10:55 GMT [133]: [5-1] 688b331f.85 0 LOG: listening on IPv4 address "0.0.0.0", port 5432
2025-07-31 09:10:55 GMT [133]: [6-1] 688b331f.85 0 LOG: listening on IPv6 address "::", port 5432
2025-07-31 09:10:55 GMT [133]: [7-1] 688b331f.85 0 LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-07-31 09:10:56 GMT [133]: [8-1] 688b331f.85 0 LOG: redirecting log output to logging collector process
2025-07-31 09:10:56 GMT [133]: [9-1] 688b331f.85 0 HINT: Future log output will appear in directory "../pg_log".
/var/run/postgresql:5432 - accepting connections
/var/run/postgresql:5432 - accepting connections
2025-07-31 09:10:56,908 INFO: establishing a new patroni connection to the postgres cluster
2025-07-31 09:10:57,034 INFO: running post_bootstrap
DO
GRANT ROLE
DO
DO
CREATE EXTENSION
NOTICE: version "1.1" of extension "pg_auth_mon" is already installed
ALTER EXTENSION
GRANT
CREATE EXTENSION
DO
NOTICE: version "1.5" of extension "pg_cron" is already installed
ALTER EXTENSION
ALTER POLICY
REVOKE
GRANT
REVOKE
GRANT
ALTER POLICY
REVOKE
GRANT
CREATE FUNCTION
REVOKE
GRANT
REVOKE
GRANT
CREATE EXTENSION
DO
CREATE TABLE
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
RESET
SET
NOTICE: schema "zmon_utils" does not exist, skipping
DROP SCHEMA
DO
NOTICE: language "plpythonu" does not exist, skipping
DROP LANGUAGE
NOTICE: function plpython_call_handler() does not exist, skipping
DROP FUNCTION
NOTICE: function plpython_inline_handler(internal) does not exist, skipping
DROP FUNCTION
NOTICE: function plpython_validator(oid) does not exist, skipping
DROP FUNCTION
CREATE SCHEMA
GRANT
SET
CREATE TYPE
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
REVOKE
CREATE VIEW
REVOKE
GRANT
GRANT
You are now connected to database "postgres" as user "postgres".
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
NOTICE: version "3.0" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
GRANT
GRANT
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
REVOKE
GRANT
REVOKE
GRANT
RESET
You are now connected to database "template1" as user "postgres".
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
NOTICE: version "3.0" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
GRANT
GRANT
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
REVOKE
GRANT
REVOKE
GRANT
RESET
2025-07-31 09:11:02,002 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2025-07-31 09:11:02,214 INFO: initialized a new cluster
2025-07-31 09:11:02,224 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:11:12,761 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:11:22,760 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:11:32,803 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:11:42,798 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:11:49.013 UTC [72] LOG Starting pgqd 3.5
2025-07-31 09:11:49.013 UTC [72] LOG auto-detecting dbs ...
2025-07-31 09:11:52,723 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:12:02,760 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:12:12,800 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:12:19.097 UTC [72] LOG {ticks: 0, maint: 0, retry: 0}
2025-07-31 09:12:23,102 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:12:32,997 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:12:42,899 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:12:49.097 UTC [72] LOG {ticks: 0, maint: 0, retry: 0}
2025-07-31 09:12:52,900 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:13:03,498 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:13:12,899 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:13:19.127 UTC [72] LOG {ticks: 0, maint: 0, retry: 0}
2025-07-31 09:13:22,762 INFO: no action. I am (postgres-fuhcpe-postgresql-1), the leader with the lock
2025-07-31 09:13:30,199 ERROR: get_postgresql_status
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 673, in get_postgresql_status
row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 655, in query
return self.server.query(sql, *params)
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 749, in query
raise e
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 745, in query
cursor.execute(sql, params)
psycopg2.errors.QueryCanceled: canceling statement due to statement timeout
2025-07-31 09:13:36,205 ERROR: get_postgresql_status
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 673, in get_postgresql_status
row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 655, in query
return self.server.query(sql, *params)
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 749, in query
raise e
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 745, in query
cursor.execute(sql, params)
psycopg2.errors.QueryCanceled: canceling statement due to statement timeout
2025-07-31 09:13:42,209 ERROR: get_postgresql_status
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 673, in get_postgresql_status
row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 655, in query
return self.server.query(sql, *params)
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 749, in query
raise e
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 745, in query
cursor.execute(sql, params)
psycopg2.errors.QueryCanceled: canceling statement due to statement timeout
2025-07-31 09:13:48,213 ERROR: get_postgresql_status
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 673, in get_postgresql_status
row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 655, in query
return self.server.query(sql, *params)
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 749, in query
raise e
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 745, in query
cursor.execute(sql, params)
psycopg2.errors.QueryCanceled: canceling statement due to statement timeout
2025-07-31 09:13:49.128 UTC [72] LOG {ticks: 0, maint: 0, retry: 0}
2025-07-31 09:13:54,216 ERROR: get_postgresql_status
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 673, in get_postgresql_status
row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 655, in query
return self.server.query(sql, *params)
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 749, in query
raise e
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 745, in query
cursor.execute(sql, params)
psycopg2.errors.QueryCanceled: canceling statement due to statement timeout
...
2025-07-31 09:15:19.218 UTC [72] LOG {ticks: 0, maint: 0, retry: 0}
2025-07-31 09:15:23,062 ERROR: get_postgresql_status
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 673, in get_postgresql_status
row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 655, in query
return self.server.query(sql, *params)
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 749, in query
raise e
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 745, in query
cursor.execute(sql, params)
psycopg2.errors.QueryCanceled: canceling statement due to statement timeout
2025-07-31 09:15:33,123 ERROR: get_postgresql_status
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 673, in get_postgresql_status
row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 657, in query
return retry(self.server.query, sql, *params)
File "/usr/local/lib/python3.10/dist-packages/patroni/utils.py", line 334, in __call__
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 749, in query
raise e
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 745, in query
cursor.execute(sql, params)
psycopg2.errors.QueryCanceled: canceling statement due to statement timeout
2025-07-31 09:13:50,729 INFO: Lock owner: postgres-fuhcpe-postgresql-1; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:15:33,123 ERROR: failed to update leader lock
2025-07-31 09:15:33,124 INFO: Demoting self (immediate-nolock)
2025-07-31 09:15:33,124 WARNING: Exception happened during processing of request from ::ffff:192.168.0.239:59420
2025-07-31 09:15:33,125 WARNING: Traceback (most recent call last):
File "/usr/lib/python3.10/socketserver.py", line 683, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib/python3.10/socketserver.py", line 360, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python3.10/socketserver.py", line 747, in __init__
self.handle()
File "/usr/lib/python3.10/http/server.py", line 433, in handle
self.handle_one_request()
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 721, in handle_one_request
BaseHTTPRequestHandler.handle_one_request(self)
File "/usr/lib/python3.10/http/server.py", line 421, in handle_one_request
method()
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 210, in do_GET_patroni
self._write_status_response(200, response)
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 90, in _write_status_response
self._write_json_response(status_code, response)
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 49, in _write_json_response
self._write_response(status_code, json.dumps(response, default=str), content_type='application/json')
File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 45, in _write_response
self.end_headers()
File "/usr/lib/python3.10/http/server.py", line 535, in end_headers
self.flush_headers()
File "/usr/lib/python3.10/http/server.py", line 539, in flush_headers
self.wfile.write(b"".join(self._headers_buffer))
File "/usr/lib/python3.10/socketserver.py", line 826, in write
self._sock.sendall(b)
BrokenPipeError: [Errno 32] Broken pipe
2025-07-31 09:15:34,136 INFO: demoted self because failed to update leader lock in DCS
2025-07-31 09:15:34,136 ERROR: Exception during execution of long running task starting after demotion
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/patroni/async_executor.py", line 97, in run
wakeup = func(*args) if args else func()
File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/__init__.py", line 954, in follow
self.config.write_recovery_conf(recovery_params)
File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/config.py", line 777, in write_recovery_conf
open(self._standby_signal, 'w').close()
OSError: [Errno 107] Transport endpoint is not connected: '/home/postgres/pgdata/pgroot/data/standby.signal'
2025-07-31 09:15:34,137 WARNING: Loop time exceeded, rescheduling immediately.
2025-07-31 09:15:34,138 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:15:34,299 INFO: trying to bootstrap from leader 'postgres-fuhcpe-postgresql-0'
mkdir: cannot create directory ‘/home/postgres/pgdata’: Transport endpoint is not connected
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
rm: cannot remove '/home/postgres/pgdata/pgroot/wal_fast/*': Transport endpoint is not connected
/scripts/basebackup.sh: line 89: /home/postgres/pgdata/pgroot/wal_fast/receivewal.lock: Transport endpoint is not connected
flock: 9: Bad file descriptor
cat: /home/postgres/pgdata/pgroot/wal_fast/receivewal.pid: Transport endpoint is not connected
2025-07-31 09:15:34,400 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:15:34,899 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:15:38,102 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:15:38,102 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:15:48,104 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:15:48,104 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
2025-07-31 09:15:49.234 UTC [72] LOG {ticks: 0, maint: 0, retry: 0}
2025-07-31 09:15:58,105 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:15:58,105 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:16:05,500 ERROR: Error creating replica using method basebackup_fast_xlog: /scripts/basebackup.sh exited with code=1
2025-07-31 09:16:05,500 ERROR: failed to bootstrap from leader 'postgres-fuhcpe-postgresql-0'
2025-07-31 09:16:05,500 INFO: Removing data directory: /home/postgres/pgdata/pgroot/data
2025-07-31 09:16:08,104 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:16:08,141 INFO: trying to bootstrap from leader 'postgres-fuhcpe-postgresql-0'
mkdir: cannot create directory ‘/home/postgres/pgdata’: Transport endpoint is not connected
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
rm: cannot remove '/home/postgres/pgdata/pgroot/wal_fast/*': Transport endpoint is not connected
/scripts/basebackup.sh: line 89: /home/postgres/pgdata/pgroot/wal_fast/receivewal.lock: Transport endpoint is not connected
flock: 9: Bad file descriptor
cat: /home/postgres/pgdata/pgroot/wal_fast/receivewal.pid: Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:16:18,105 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:16:18,143 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:16:19.264 UTC [72] LOG {ticks: 0, maint: 0, retry: 0}
2025-07-31 09:16:23.088 UTC [72] ERROR connection error: PQconnectStart
2025-07-31 09:16:23.088 UTC [72] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
2025-07-31 09:16:23.088 UTC [72] WARNING [executions_loop]: default timeout
2025-07-31 09:16:28,105 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:16:28,105 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
2025-07-31 09:16:33.134 UTC [72] ERROR connection error: PQconnectStart
2025-07-31 09:16:33.134 UTC [72] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
2025-07-31 09:16:33.134 UTC [72] WARNING [postgres]: default timeout
2025-07-31 09:16:33.134 UTC [72] ERROR connection error: PQconnectStart
2025-07-31 09:16:33.134 UTC [72] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
2025-07-31 09:16:38,104 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:16:38,105 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:16:39,008 ERROR: Error creating replica using method basebackup_fast_xlog: /scripts/basebackup.sh exited with code=1
2025-07-31 09:16:39,008 ERROR: failed to bootstrap from leader 'postgres-fuhcpe-postgresql-0'
2025-07-31 09:16:39,008 INFO: Removing data directory: /home/postgres/pgdata/pgroot/data
2025-07-31 09:16:48,105 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:16:48,143 INFO: trying to bootstrap from leader 'postgres-fuhcpe-postgresql-0'
mkdir: cannot create directory ‘/home/postgres/pgdata’: Transport endpoint is not connected
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
rm: cannot remove '/home/postgres/pgdata/pgroot/wal_fast/*': Transport endpoint is not connected
/scripts/basebackup.sh: line 89: /home/postgres/pgdata/pgroot/wal_fast/receivewal.lock: Transport endpoint is not connected
flock: 9: Bad file descriptor
cat: /home/postgres/pgdata/pgroot/wal_fast/receivewal.pid: Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:16:49.280 UTC [72] LOG {ticks: 0, maint: 0, retry: 0}
2025-07-31 09:16:58,105 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:16:58,146 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:17:08,105 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:17:08,105 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
2025-07-31 09:17:18,105 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:17:18,105 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:17:18,997 ERROR: Error creating replica using method basebackup_fast_xlog: /scripts/basebackup.sh exited with code=1
2025-07-31 09:17:18,997 ERROR: failed to bootstrap from leader 'postgres-fuhcpe-postgresql-0'
2025-07-31 09:17:18,997 INFO: Removing data directory: /home/postgres/pgdata/pgroot/data
2025-07-31 09:17:19.310 UTC [72] LOG {ticks: 0, maint: 0, retry: 0}
2025-07-31 09:17:23.091 UTC [72] ERROR connection error: PQconnectStart
2025-07-31 09:17:23.091 UTC [72] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
2025-07-31 09:17:23.091 UTC [72] WARNING [executions_loop]: default timeout
2025-07-31 09:17:28,104 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:17:28,141 INFO: trying to bootstrap from leader 'postgres-fuhcpe-postgresql-0'
mkdir: cannot create directory ‘/home/postgres/pgdata’: Transport endpoint is not connected
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
rm: cannot remove '/home/postgres/pgdata/pgroot/wal_fast/*': Transport endpoint is not connected
/scripts/basebackup.sh: line 89: /home/postgres/pgdata/pgroot/wal_fast/receivewal.lock: Transport endpoint is not connected
flock: 9: Bad file descriptor
...
flock: 9: Bad file descriptor
cat: /home/postgres/pgdata/pgroot/wal_fast/receivewal.pid: Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:42:18,104 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:42:18,144 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:42:20.469 UTC [72] LOG {ticks: 0, maint: 0, retry: 0}
2025-07-31 09:42:23.170 UTC [72] ERROR connection error: PQconnectStart
2025-07-31 09:42:23.170 UTC [72] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
2025-07-31 09:42:23.170 UTC [72] WARNING [executions_loop]: default timeout
2025-07-31 09:42:28,104 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:42:28,104 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
2025-07-31 09:42:33.404 UTC [72] ERROR connection error: PQconnectStart
2025-07-31 09:42:33.404 UTC [72] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
2025-07-31 09:42:33.404 UTC [72] WARNING [postgres]: default timeout
2025-07-31 09:42:33.404 UTC [72] ERROR connection error: PQconnectStart
2025-07-31 09:42:33.404 UTC [72] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
2025-07-31 09:42:38,105 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:42:38,105 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:42:38,997 ERROR: Error creating replica using method basebackup_fast_xlog: /scripts/basebackup.sh exited with code=1
2025-07-31 09:42:38,997 ERROR: failed to bootstrap from leader 'postgres-fuhcpe-postgresql-0'
2025-07-31 09:42:38,997 INFO: Removing data directory: /home/postgres/pgdata/pgroot/data
2025-07-31 09:42:48,104 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:42:48,148 INFO: trying to bootstrap from leader 'postgres-fuhcpe-postgresql-0'
mkdir: cannot create directory ‘/home/postgres/pgdata’: Transport endpoint is not connected
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
rm: cannot remove '/home/postgres/pgdata/pgroot/wal_fast/*': Transport endpoint is not connected
/scripts/basebackup.sh: line 89: /home/postgres/pgdata/pgroot/wal_fast/receivewal.lock: Transport endpoint is not connected
flock: 9: Bad file descriptor
cat: /home/postgres/pgdata/pgroot/wal_fast/receivewal.pid: Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:42:50.486 UTC [72] LOG {ticks: 0, maint: 0, retry: 0}
2025-07-31 09:42:58,104 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:42:58,399 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:43:08,104 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:43:08,104 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
2025-07-31 09:43:18,105 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:43:18,105 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:43:19,000 ERROR: Error creating replica using method basebackup_fast_xlog: /scripts/basebackup.sh exited with code=1
2025-07-31 09:43:19,001 ERROR: failed to bootstrap from leader 'postgres-fuhcpe-postgresql-0'
2025-07-31 09:43:19,001 INFO: Removing data directory: /home/postgres/pgdata/pgroot/data
2025-07-31 09:43:20.516 UTC [72] LOG {ticks: 0, maint: 0, retry: 0}
2025-07-31 09:43:23.173 UTC [72] ERROR connection error: PQconnectStart
2025-07-31 09:43:23.173 UTC [72] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
2025-07-31 09:43:23.173 UTC [72] WARNING [executions_loop]: default timeout
2025-07-31 09:43:28,104 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:43:28,146 INFO: trying to bootstrap from leader 'postgres-fuhcpe-postgresql-0'
mkdir: cannot create directory ‘/home/postgres/pgdata’: Transport endpoint is not connected
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
rm: cannot remove '/home/postgres/pgdata/pgroot/wal_fast/*': Transport endpoint is not connected
/scripts/basebackup.sh: line 89: /home/postgres/pgdata/pgroot/wal_fast/receivewal.lock: Transport endpoint is not connected
flock: 9: Bad file descriptor
cat: /home/postgres/pgdata/pgroot/wal_fast/receivewal.pid: Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:43:33.415 UTC [72] ERROR connection error: PQconnectStart
2025-07-31 09:43:33.415 UTC [72] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
2025-07-31 09:43:33.415 UTC [72] WARNING [postgres]: default timeout
2025-07-31 09:43:33.415 UTC [72] ERROR connection error: PQconnectStart
2025-07-31 09:43:33.415 UTC [72] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
2025-07-31 09:43:38,104 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:43:38,143 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
rm: cannot remove '/home/postgres/pgdata/pgroot/data': Transport endpoint is not connected
pg_basebackup: error: could not access directory "/home/postgres/pgdata/pgroot/data": Transport endpoint is not connected
2025-07-31 09:43:48,105 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:43:48,105 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
2025-07-31 09:43:50.533 UTC [72] LOG {ticks: 0, maint: 0, retry: 0}
2025-07-31 09:43:58,104 INFO: Lock owner: postgres-fuhcpe-postgresql-0; I am postgres-fuhcpe-postgresql-1
2025-07-31 09:43:58,104 INFO: bootstrap from leader 'postgres-fuhcpe-postgresql-0' in progress
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
Additional context Add any other context about the problem here.
This issue has been marked as stale because it has been open for 30 days with no activity
happening with istio as well
This issue has been marked as stale because it has been open for 30 days with no activity