postgres-operator Connection poolers point to read-only replica after a fail-over

Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.7.1
Where do you run it - cloud or metal? Kubernetes or OpenShift? Bare Metal K8S
Are you running Postgres Operator in production? Yes
Type of issue? Bug report

We are encountering an issue where the connection poolers continue pointing to the old master Postgres replica after a fail-over. We think this is the sequence of events:

postgres-2 was the leader
postgres-2 lost leader lock and started to demote itself
postgres-pooler lost connection to postgres-2 and tried to reconnect
postgres-0 got leader lock and started to promote itself
postgres-2 finished restarting before postgres-0
postgres-pooler reconnected to postgres-2
postgres-0 finished promoting
patroni updated K8S service IP to point to the endpoint of postgres-0, but it was too late since connection-pooler had already finish reconnecting
postgres-pooler were now stucked with the read-only postgres-2 until server_lifetime elapsed

Does this make sense ? Should postgres-operator issue a RECONNECT to command to all pooler after the service endpoint is update ?

Jun 14 '22 10:06 kien-truong

@kien-truong You can mitigate this by building a custom pooler image with server_fast_close = 1, this will ensure that when the underlying endpoint changes the DB connections will be reestablished. Building a custom image is necessary until #1925 gets fixed. Here is an example on how to customize the pooler image:

FROM registry.opensource.zalan.do/acid/pgbouncer:master-22

USER root

RUN apk add --upgrade apk-tools &&  \
    apk update && apk -U upgrade && \
    rm -rf /var/cache/apk/*

USER pgbouncer
# Tweak for fast switchover
RUN echo 'server_fast_close = 1' >> /etc/pgbouncer/pgbouncer.ini.tmpl
RUN echo 'dns_max_ttl = 5' >> /etc/pgbouncer/pgbouncer.ini.tmpl
RUN echo 'dns_nxdomain_ttl = 5' >> /etc/pgbouncer/pgbouncer.ini.tmpl

IF you want to test it out I have the rebuilt image publicly available here: https://hub.docker.com/r/studentcoin/registry-opensource-zalan-do-acid-pgbouncer-master-22-fast-failover Specify it in the operator configuration and the poolers should automatically restart with the new image.

Jun 16 '22 11:06 gorbak25

I don't think server_fast_close would help, since it has no effect in transaction pool, which is what we're using.

Also, PGBouncer cannot detect that the underlying endpoint has change, since it connects to Postgres through the service IP, which never changes.

Jun 16 '22 11:06 kien-truong

@kien-truong I run PgBouncer in production in session mode and I've validated that server_fast_close helps, especially with long lived DB sessions. PgBouncer connects to the master postgres instance using the service DNS name. When the endpoint gets updated subsequent DNS queries for the service return the updated ip address. From the pgbouncer documentation:

server_fast_close
Disconnect a server in session pooling mode immediately or after the end of the current transaction if it is in “close_needed” mode (set by RECONNECT, RELOAD that changes connection settings, or DNS change), rather than waiting for the session end. In statement or transaction pooling mode, this has no effect since that is the default behavior there.

"close_needed" mode is entered when a DNS change occurs. By setting dns_max_ttl and dns_nxdomain_ttl to a small value you ensure that pgbouncer discovers that the underlying IP address changed during the next connection attempt/the next time pgbouncer resolves the DNS name.

Jun 16 '22 12:06 gorbak25

We're using transaction pooling.

server_fast_close
... In statement or transaction pooling mode, this has no effect since that is the default behavior there.

Jun 16 '22 12:06 kien-truong

@kien-truong perhaps setting dns_zone_check_period will help?

Jun 16 '22 12:06 gorbak25

Turns out that pgbouncer is pointed to a ClusterIP service. PGBouncer is unable to detect a failover of the master instance as it's always pointed to the same IP.

Jul 28 '22 07:07 gorbak25

Hello, is there any update on the issue?

Jan 15 '23 23:01 dobrac

Hi, we also stumbled over this problem. Has anybody a successful solution to this problem?

Apr 17 '23 08:04 nupis-DanielS

In our case the problem exists because the connection pooler points to the postgres-cluster service which is not headless and therefor has a cluster ip. Without a cluster ip (headless service) the problem could be solved. But we see no configuration option to make this service headless.

Apr 18 '23 07:04 nupis-ChristophS

We often also encounter this problem when using postgres-operator. Our current solution is to write a watcher service, create a new headless service, then change the host in the pooler image and reload the configuration.

We use fabric k8s client, and here is the code:

    /**
     * add new headless service to pooler container /etc/pgbouncer/pgbouncer.ini
     * and then reload pgbouncer
     */
    private void injectHeadlessService(Pod pooler) {
        if (pooler.getStatus() == null || !"Running".equals(pooler.getStatus().getPhase())) return;
        if (pooler.getMetadata().getDeletionTimestamp() != null) return;

        // 1. get cluster name
        var poolerName = pooler.getMetadata().getName();
        Map<String, String> labels = pooler.getMetadata().getLabels() == null ? Collections.emptyMap() : pooler.getMetadata().getLabels();
        var clusterName = labels.getOrDefault("cluster-name", poolerName.substring(0, poolerName.indexOf("-pooler")));

        // 2. check if service is exits
        var headlessServiceName = clusterName + "-headless";
        var headlessService = client.services().inNamespace(client.getNamespace()).withName(headlessServiceName).get();
        if (headlessService == null) {
            LOGGER.infov("create headless service {0}", headlessServiceName);
            var builder = new ServiceBuilder()
                    .editOrNewMetadata()
                    .withName(headlessServiceName)
                    .withNamespace(client.getNamespace())
                    .withLabels(Map.of("application", "spilo", "cluster-name", clusterName, "spilo-role", "master"))
                    .endMetadata()
                    .editOrNewSpec()
                    .withClusterIP("None")
                    .withPorts(new ServicePortBuilder()
                            .withName("postgresql")
                            .withProtocol("TCP").withPort(5432)
                            .withTargetPort(new IntOrString(5432))
                            .build())
                    .withType("ClusterIP")
                    .withSelector(Map.of("application", "spilo", "cluster-name", clusterName, "spilo-role", "master"))
                    .endSpec();
            client.resource(builder.build()).createOrReplace();
        }

        // 3. check if hosts has changed to headless service
        var host = executor.exec(pooler, PostgresOperatorContainers.POOLER, "sh", "-c",
                        "cat /etc/pgbouncer/pgbouncer.ini|grep host= | cut -d ' ' -f 3")
                .get(0);
        // e.g host=xxxx.rds.svc.cluster.local
        var originHost = host.split("=")[1];
        if (!originHost.equals(headlessServiceName)) {
            // replace /etc/pgbouncer/pgbouncer.ini
            LOGGER.infov("change host from {0} to {1} in /etc/pgbouncer/pgbouncer.ini in pod {2}",
                    originHost, headlessServiceName, poolerName);
            executor.exec(pooler, PostgresOperatorContainers.POOLER, "sh", "-c",
                    String.format("sed -i 's/%s/%s/' /etc/pgbouncer/pgbouncer.ini", originHost, headlessServiceName));

            // 4. reload pgbouncer
            LOGGER.infov("reload pgbouncer in pod {0}", poolerName);
            executor.exec(pooler, PostgresOperatorContainers.POOLER, "sh", "-c",
                    "psql -p 5432 -h 127.0.0.1 -U pooler pgbouncer -w -c \"RELOAD\"");
        }
    }

After RELOAD, we have found that the DNS can get the master pod with command SHOW DNS_HOSTS. Hope this can help you. @kien-truong @gorbak25

Jun 13 '23 03:06 cdmikechen