postgres-operator
postgres-operator copied to clipboard
Logical backup job fails if pooler is enabled for the cluster - pg_dump: error: connection to database "template1"
Please, answer some short questions which should help us to understand your problem / question better?
- Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.6.1
- **Where do you run it - cloud or metal? Kubernetes with kops on AWS
- Are you running Postgres Operator in production? we are going to
- Type of issue? Bug report
When the cluster manifest enables poolers cronjobs for logical backups fail with error pg_dump: error: connection to database "template1" failed: ERROR: unexpected response from login query
job full logs:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 12220 0 12220 0 0 917k 0 --:--:-- --:--:-- --:--:-- 994k
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 20702 0 20702 0 0 1555k 0 --:--:-- --:--:-- --:--:-- 1555k
+ dump
+ /usr/lib/postgresql/13/bin/pg_dumpall
+ compress
+ pigz
+ upload
+ case $LOGICAL_BACKUP_PROVIDER in
++ estimate_size
++ /usr/lib/postgresql/13/bin/psql -tqAc 'select sum(pg_database_size(datname)::numeric) from pg_database;'
+ aws_upload 6710488
+ declare -r EXPECTED_SIZE=6710488
++ date +%s
+ PATH_TO_BACKUP=s3://postgres-operator-backups-bucket/spilo/test-db/aae491ea-8a1b-4d1f-b5bd-755466eae34c/logical_backups/1616113817.sql.gz
+ args=()
+ [[ ! -z 6710488 ]]
+ args+=("--expected-size=$EXPECTED_SIZE")
+ [[ ! -z '' ]]
+ [[ ! -z us-east-1 ]]
+ args+=("--region=$LOGICAL_BACKUP_S3_REGION")
+ [[ ! -z AES256 ]]
+ args+=("--sse=$LOGICAL_BACKUP_S3_SSE")
+ aws s3 cp - s3://postgres-operator-backups-bucket/spilo/test-db/aae491ea-8a1b-4d1f-b5bd-755466eae34c/logical_backups/1616113817.sql.gz --expected-size=6710488 --region=us-east-1 --sse=AES256
pg_dump: error: connection to database "template1" failed: ERROR: unexpected response from login query
pg_dumpall: error: pg_dump failed on database "template1", exiting
cluster manifest:
apiVersion: "acid.zalan.do/v1"
kind: "postgresql"
metadata:
name: "test-db"
namespace: "postgres-test"
labels:
team: test
spec:
teamId: "test"
postgresql:
version: "13"
parameters:
max_connections: "100"
numberOfInstances: 2
enableConnectionPooler: true
enableReplicaConnectionPooler: true
enableLogicalBackup: true
volume:
size: "5Gi"
users:
admin:
- superuser
- createdb
test-user: []
databases:
test: test-user
resources:
requests:
cpu: 100m
memory: 500Mi
limits:
cpu: 1000m
memory: 2000Mi
sidecars:
- name: postgres-exporter
image: wrouesnel/postgres_exporter
env:
- name: DATA_SOURCE_USER
valueFrom:
secretKeyRef:
name: postgres.test-db.credentials.postgresql.acid.zalan.do
key: username
- name: DATA_SOURCE_PASS
valueFrom:
secretKeyRef:
key: password
name: postgres.test-db.credentials.postgresql.acid.zalan.do
- name: "DATA_SOURCE_NAME"
value: "postgresql://$(DATA_SOURCE_USER):$(DATA_SOURCE_PASS)@$(POD_NAME):5432/postgres"
ports:
- containerPort: 9187
protocol: TCP
resources:
limits:
cpu: 500m
memory: 100Mi
requests:
cpu: 100m
memory: 100Mi
juts for information, that until the root cause is found, we can apply manually on template1 database to workaround the issue:
CREATE SCHEMA pooler;
REVOKE ALL ON SCHEMA pooler FROM public, pooler;
GRANT USAGE ON SCHEMA pooler TO pooler;
CREATE OR REPLACE FUNCTION pooler.user_lookup(IN i_username text, OUT uname text, OUT phash text)
RETURNS record AS $$
BEGIN
SELECT usename, passwd FROM pg_catalog.pg_shadow
WHERE usename = i_username INTO uname, phash;
RETURN;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
REVOKE ALL ON FUNCTION pooler.user_lookup(text) FROM public, pooler;
GRANT EXECUTE ON FUNCTION pooler.user_lookup(text) TO pooler;
It seems that the pooler schema was only created on cluster's creation. The other db I added to the cluster's config does not have pooler schema In this example I added sso to "preparedDatabases" when creating the cluster, then added salebot and apply the new config
sso=# \dn
List of schemas
Name | Owner
-----------------+----------------
data | sso_data_owner
metric_helpers | postgres
pooler | postgres
public | postgres
user_management | postgres
(5 rows)
sso=# \connect salebot
psql (13.3 (Ubuntu 13.3-1.pgdg20.04+1), server 13.2 (Ubuntu 13.2-1.pgdg18.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
You are now connected to database "salebot" as user "postgres".
salebot=# \dn
List of schemas
Name | Owner
-----------------+--------------------
data | salebot_data_owner
metric_helpers | postgres
public | postgres
user_management | postgres
(4 rows)
salebot=#
Apparently installLookupFunction (https://github.com/zalando/postgres-operator/blob/eeb59c5bfd127020fa69bb29c58827cec905aa00/pkg/cluster/database.go#L482) does not get called on cluster update
Workaround has fixed it for me, however it's concerning that this issue has gone unaddressed for 2 years. This should be prioritized or at least documented as a caveat when enabling poolers.
Sometimes i get some errors:
pg_dump: error: connection to server at "192.168.100.110", port 5432 failed: FATAL: bouncer config error
pg_dumpall: error: pg_dump failed on database "template1", exiting
kube pods
NAME READY STATUS RESTARTS AGE
postgres-backup-postgresql-28301790-x29ns 0/1 Completed 0 3d7h
postgres-backup-postgresql-28303230-9gj2k 0/1 Completed 0 2d7h
postgres-backup-postgresql-28306110-9xjl2 0/1 Error 0 7h8m
postgres-backup-postgresql-28306110-gggmj 0/1 Error 0 7h7m
postgres-backup-postgresql-28306110-gv7cl 0/1 Error 0 7h3m
postgres-backup-postgresql-28306110-tb4j6 0/1 Error 0 7h8m
postgres-backup-postgresql-28306110-w2j2c 0/1 Error 0 7h5m
postgres-backup-postgresql-28306110-x6j6p 0/1 Error 0 7h7m
postgres-backup-postgresql-28306110-z7xmv 0/1 Error 0 7h8m
postgresql-0 1/1 Running 0 80d
postgresql-1 1/1 Running 0 80d
postgresql-pooler-5b9c9749f6-jkhdn 1/1 Running 0 80d
postgresql-pooler-5b9c9749f6-q42k8 1/1 Running 0 80d
postgresql-pooler-repl-784f699c76-z4bnr 1/1 Running 0 80d
postgresql-pooler-repl-784f699c76-zhmbj 1/1 Running 0 80d
Sometimes i get some errors:
pg_dump: error: connection to server at "192.168.100.110", port 5432 failed: FATAL: bouncer config error pg_dumpall: error: pg_dump failed on database "template1", exitingkube pods
NAME READY STATUS RESTARTS AGE postgres-backup-postgresql-28301790-x29ns 0/1 Completed 0 3d7h postgres-backup-postgresql-28303230-9gj2k 0/1 Completed 0 2d7h postgres-backup-postgresql-28306110-9xjl2 0/1 Error 0 7h8m postgres-backup-postgresql-28306110-gggmj 0/1 Error 0 7h7m postgres-backup-postgresql-28306110-gv7cl 0/1 Error 0 7h3m postgres-backup-postgresql-28306110-tb4j6 0/1 Error 0 7h8m postgres-backup-postgresql-28306110-w2j2c 0/1 Error 0 7h5m postgres-backup-postgresql-28306110-x6j6p 0/1 Error 0 7h7m postgres-backup-postgresql-28306110-z7xmv 0/1 Error 0 7h8m postgresql-0 1/1 Running 0 80d postgresql-1 1/1 Running 0 80d postgresql-pooler-5b9c9749f6-jkhdn 1/1 Running 0 80d postgresql-pooler-5b9c9749f6-q42k8 1/1 Running 0 80d postgresql-pooler-repl-784f699c76-z4bnr 1/1 Running 0 80d postgresql-pooler-repl-784f699c76-zhmbj 1/1 Running 0 80d
Hi @MikeVL I'm experiencing the same issue, did you resolve it?
Sometimes i get some errors:
pg_dump: error: connection to server at "192.168.100.110", port 5432 failed: FATAL: bouncer config error pg_dumpall: error: pg_dump failed on database "template1", exitingkube pods
NAME READY STATUS RESTARTS AGE postgres-backup-postgresql-28301790-x29ns 0/1 Completed 0 3d7h postgres-backup-postgresql-28303230-9gj2k 0/1 Completed 0 2d7h postgres-backup-postgresql-28306110-9xjl2 0/1 Error 0 7h8m postgres-backup-postgresql-28306110-gggmj 0/1 Error 0 7h7m postgres-backup-postgresql-28306110-gv7cl 0/1 Error 0 7h3m postgres-backup-postgresql-28306110-tb4j6 0/1 Error 0 7h8m postgres-backup-postgresql-28306110-w2j2c 0/1 Error 0 7h5m postgres-backup-postgresql-28306110-x6j6p 0/1 Error 0 7h7m postgres-backup-postgresql-28306110-z7xmv 0/1 Error 0 7h8m postgresql-0 1/1 Running 0 80d postgresql-1 1/1 Running 0 80d postgresql-pooler-5b9c9749f6-jkhdn 1/1 Running 0 80d postgresql-pooler-5b9c9749f6-q42k8 1/1 Running 0 80d postgresql-pooler-repl-784f699c76-z4bnr 1/1 Running 0 80d postgresql-pooler-repl-784f699c76-zhmbj 1/1 Running 0 80dHi @MikeVL I'm experiencing the same issue, did you resolve it?
No. After restart postgresql pod works fine some time
Sometimes i get some errors:
pg_dump: error: connection to server at "192.168.100.110", port 5432 failed: FATAL: bouncer config error pg_dumpall: error: pg_dump failed on database "template1", exitingkube pods
NAME READY STATUS RESTARTS AGE postgres-backup-postgresql-28301790-x29ns 0/1 Completed 0 3d7h postgres-backup-postgresql-28303230-9gj2k 0/1 Completed 0 2d7h postgres-backup-postgresql-28306110-9xjl2 0/1 Error 0 7h8m postgres-backup-postgresql-28306110-gggmj 0/1 Error 0 7h7m postgres-backup-postgresql-28306110-gv7cl 0/1 Error 0 7h3m postgres-backup-postgresql-28306110-tb4j6 0/1 Error 0 7h8m postgres-backup-postgresql-28306110-w2j2c 0/1 Error 0 7h5m postgres-backup-postgresql-28306110-x6j6p 0/1 Error 0 7h7m postgres-backup-postgresql-28306110-z7xmv 0/1 Error 0 7h8m postgresql-0 1/1 Running 0 80d postgresql-1 1/1 Running 0 80d postgresql-pooler-5b9c9749f6-jkhdn 1/1 Running 0 80d postgresql-pooler-5b9c9749f6-q42k8 1/1 Running 0 80d postgresql-pooler-repl-784f699c76-z4bnr 1/1 Running 0 80d postgresql-pooler-repl-784f699c76-zhmbj 1/1 Running 0 80dHi @MikeVL I'm experiencing the same issue, did you resolve it?
No. After restart postgresql pod works fine some time
I have disabled replication pooler for now to see if that works, because the logical backup tries to connect to the pooler-repl. I have executed a manual job and worked fine, but ill keep an eye on it to see how it works on the next days.
UPDATE: With the replica pooler disabled the logical backup works fine in our infrastructure.
I am also getting this issue. Any update on this?
CREATE SCHEMA pooler; REVOKE ALL ON SCHEMA pooler FROM public, pooler; GRANT USAGE ON SCHEMA pooler TO pooler; CREATE OR REPLACE FUNCTION pooler.user_lookup(IN i_username text, OUT uname text, OUT phash text) RETURNS record AS $$ BEGIN SELECT usename, passwd FROM pg_catalog.pg_shadow WHERE usename = i_username INTO uname, phash; RETURN; END; $$ LANGUAGE plpgsql SECURITY DEFINER; REVOKE ALL ON FUNCTION pooler.user_lookup(text) FROM public, pooler; GRANT EXECUTE ON FUNCTION pooler.user_lookup(text) TO pooler;
This solution does not works for me. I am using Postgressql 12.