Connection poolers point to read-only replica after a fail-over
-
Which image of the operator are you using?
registry.opensource.zalan.do/acid/postgres-operator:v1.7.1 -
Where do you run it - cloud or metal? Kubernetes or OpenShift?
Bare Metal K8S -
Are you running Postgres Operator in production?
Yes -
Type of issue?
Bug report
We are encountering an issue where the connection poolers continue pointing to the old master Postgres replica after a fail-over. We think this is the sequence of events:
postgres-2was the leaderpostgres-2lost leader lock and started to demote itselfpostgres-poolerlost connection to postgres-2 and tried to reconnectpostgres-0got leader lock and started to promote itselfpostgres-2finished restarting before postgres-0postgres-poolerreconnected to postgres-2postgres-0finished promotingpatroniupdated K8S service IP to point to the endpoint ofpostgres-0, but it was too late sinceconnection-poolerhad already finish reconnectingpostgres-poolerwere now stucked with the read-onlypostgres-2untilserver_lifetimeelapsed
Does this make sense ?
Should postgres-operator issue a RECONNECT to command to all pooler after the service endpoint is update ?
@kien-truong You can mitigate this by building a custom pooler image with server_fast_close = 1, this will ensure that when the underlying endpoint changes the DB connections will be reestablished. Building a custom image is necessary until #1925 gets fixed. Here is an example on how to customize the pooler image:
FROM registry.opensource.zalan.do/acid/pgbouncer:master-22
USER root
RUN apk add --upgrade apk-tools && \
apk update && apk -U upgrade && \
rm -rf /var/cache/apk/*
USER pgbouncer
# Tweak for fast switchover
RUN echo 'server_fast_close = 1' >> /etc/pgbouncer/pgbouncer.ini.tmpl
RUN echo 'dns_max_ttl = 5' >> /etc/pgbouncer/pgbouncer.ini.tmpl
RUN echo 'dns_nxdomain_ttl = 5' >> /etc/pgbouncer/pgbouncer.ini.tmpl
IF you want to test it out I have the rebuilt image publicly available here: https://hub.docker.com/r/studentcoin/registry-opensource-zalan-do-acid-pgbouncer-master-22-fast-failover Specify it in the operator configuration and the poolers should automatically restart with the new image.
I don't think server_fast_close would help, since it has no effect in transaction pool, which is what we're using.
Also, PGBouncer cannot detect that the underlying endpoint has change, since it connects to Postgres through the service IP, which never changes.
@kien-truong I run PgBouncer in production in session mode and I've validated that server_fast_close helps, especially with long lived DB sessions. PgBouncer connects to the master postgres instance using the service DNS name. When the endpoint gets updated subsequent DNS queries for the service return the updated ip address. From the pgbouncer documentation:
server_fast_close
Disconnect a server in session pooling mode immediately or after the end of the current transaction if it is in “close_needed” mode (set by RECONNECT, RELOAD that changes connection settings, or DNS change), rather than waiting for the session end. In statement or transaction pooling mode, this has no effect since that is the default behavior there.
"close_needed" mode is entered when a DNS change occurs. By setting dns_max_ttl and dns_nxdomain_ttl to a small value you ensure that pgbouncer discovers that the underlying IP address changed during the next connection attempt/the next time pgbouncer resolves the DNS name.
We're using transaction pooling.
server_fast_close
... In statement or transaction pooling mode, this has no effect since that is the default behavior there.
@kien-truong perhaps setting dns_zone_check_period will help?
Turns out that pgbouncer is pointed to a ClusterIP service. PGBouncer is unable to detect a failover of the master instance as it's always pointed to the same IP.
Hello, is there any update on the issue?
Hi, we also stumbled over this problem. Has anybody a successful solution to this problem?
In our case the problem exists because the connection pooler points to the postgres-cluster service which is not headless and therefor has a cluster ip. Without a cluster ip (headless service) the problem could be solved. But we see no configuration option to make this service headless.
We often also encounter this problem when using postgres-operator. Our current solution is to write a watcher service, create a new headless service, then change the host in the pooler image and reload the configuration.
We use fabric k8s client, and here is the code:
/**
* add new headless service to pooler container /etc/pgbouncer/pgbouncer.ini
* and then reload pgbouncer
*/
private void injectHeadlessService(Pod pooler) {
if (pooler.getStatus() == null || !"Running".equals(pooler.getStatus().getPhase())) return;
if (pooler.getMetadata().getDeletionTimestamp() != null) return;
// 1. get cluster name
var poolerName = pooler.getMetadata().getName();
Map<String, String> labels = pooler.getMetadata().getLabels() == null ? Collections.emptyMap() : pooler.getMetadata().getLabels();
var clusterName = labels.getOrDefault("cluster-name", poolerName.substring(0, poolerName.indexOf("-pooler")));
// 2. check if service is exits
var headlessServiceName = clusterName + "-headless";
var headlessService = client.services().inNamespace(client.getNamespace()).withName(headlessServiceName).get();
if (headlessService == null) {
LOGGER.infov("create headless service {0}", headlessServiceName);
var builder = new ServiceBuilder()
.editOrNewMetadata()
.withName(headlessServiceName)
.withNamespace(client.getNamespace())
.withLabels(Map.of("application", "spilo", "cluster-name", clusterName, "spilo-role", "master"))
.endMetadata()
.editOrNewSpec()
.withClusterIP("None")
.withPorts(new ServicePortBuilder()
.withName("postgresql")
.withProtocol("TCP").withPort(5432)
.withTargetPort(new IntOrString(5432))
.build())
.withType("ClusterIP")
.withSelector(Map.of("application", "spilo", "cluster-name", clusterName, "spilo-role", "master"))
.endSpec();
client.resource(builder.build()).createOrReplace();
}
// 3. check if hosts has changed to headless service
var host = executor.exec(pooler, PostgresOperatorContainers.POOLER, "sh", "-c",
"cat /etc/pgbouncer/pgbouncer.ini|grep host= | cut -d ' ' -f 3")
.get(0);
// e.g host=xxxx.rds.svc.cluster.local
var originHost = host.split("=")[1];
if (!originHost.equals(headlessServiceName)) {
// replace /etc/pgbouncer/pgbouncer.ini
LOGGER.infov("change host from {0} to {1} in /etc/pgbouncer/pgbouncer.ini in pod {2}",
originHost, headlessServiceName, poolerName);
executor.exec(pooler, PostgresOperatorContainers.POOLER, "sh", "-c",
String.format("sed -i 's/%s/%s/' /etc/pgbouncer/pgbouncer.ini", originHost, headlessServiceName));
// 4. reload pgbouncer
LOGGER.infov("reload pgbouncer in pod {0}", poolerName);
executor.exec(pooler, PostgresOperatorContainers.POOLER, "sh", "-c",
"psql -p 5432 -h 127.0.0.1 -U pooler pgbouncer -w -c \"RELOAD\"");
}
}
After RELOAD, we have found that the DNS can get the master pod with command SHOW DNS_HOSTS.
Hope this can help you.
@kien-truong @gorbak25