percona-postgresql-operator icon indicating copy to clipboard operation
percona-postgresql-operator copied to clipboard

[2.7.0] clusterdomain detection fail, so pgbackrest got wrong hostnames

Open pasztorl opened this issue 2 months ago • 4 comments

Report

i see that operator try to detect the clusterdomain here. We use custom clusterdomains, what this function result is just "kubernetes" If i run a query from a postgres pod i got this:

cat /etc/resolv.conf 
search example-db.svc.k8s.test1.example.com svc.k8s.test1.example.com k8s.test1.example.com
nameserver 10.15.85.18
options ndots:5

host kubernetes.default.svc
kubernetes.default.svc.k8s.test1.example.com has address 10.15.85.234

More about the problem

The result is that the pgbackrest config looks like this:

...
pg1-host = example-pg-example-pg-25pk-0.example-pg-pods.example-db.svc.kubernetes
...

This name (domain) not exists in the cluster so backup fails:

time="2025-10-13T20:02:44Z" level=info msg="[pgbackrest:stdout] 2025-10-13 20:02:44.231 P00   WARN: unable to check pg1: [HostConnectError] unable to get address for 'example-pg-example-pg-25pk-0.example-pg-example-db.svc.kubernetes': [-2] Name or service not known"

Steps to reproduce

  1. i created a little go application for checking using the same code:
package main

import (
        "context"
        "fmt"
        "net"
        "os"
        "strings"
        "time"
)

func main() {
        if len(os.Args) < 2 {
                fmt.Fprintf(os.Stderr, "usage: %s <nameserver-ip[:port]>\n", os.Args[0])
                os.Exit(2)
        }
        ns := os.Args[1]
        if !strings.Contains(ns, ":") {
                ns += ":53"
        }

        // Use stdlib resolver, pointed at the provided nameserver.
        resolver := &net.Resolver{
                PreferGo: true,
                Dial: func(ctx context.Context, _, _ string) (net.Conn, error) {
                        var d net.Dialer
                        return d.DialContext(ctx, "udp", ns)
                },
        }

        ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
        defer cancel()

        api := "kubernetes.default.svc"
        cname, err := resolver.LookupCNAME(ctx, api)
        if err == nil {
                fmt.Println(strings.TrimSuffix(strings.TrimPrefix(cname, api+"."), "."))
                fmt.Println(cname,api)
                return
        }
        fmt.Println("cluster.local")
}
  1. this code returns:
kdc 10.15.85.18
kubernetes
kubernetes kubernetes.default.svc

Versions

  1. Kubernetes 1.32.2
  2. Operator 2.7.0

Anything else?

I prefer adding clusterDomain as a helm value, but I also like the "auto-detection", but in this case it not worked.

pasztorl avatar Oct 13 '25 20:10 pasztorl

Update: the problem is that the operator running inside vcluster and vcluster adds this to the podSpec:

  hostAliases:
    - ip: 10.15.85.234
      hostnames:
        - kubernetes
        - kubernetes.default
        - kubernetes.default.svc

so this is why this code returns kubernetes. Can you please make clusterDomain configurable, so with this i can override autodetected value? Thanks!

pasztorl avatar Oct 13 '25 21:10 pasztorl

hello @pasztorl, thank you for this issue. We already have this in our radar, check this issue: https://perconadev.atlassian.net/browse/K8SPG-694. Please have a look.

gkech avatar Oct 31 '25 10:10 gkech

Thanks for the link. i see that this ticket unassigned and last activity was almost a year ago, so how can i move this forward?

pasztorl avatar Oct 31 '25 10:10 pasztorl

Thanks for the link. i see that this ticket unassigned and last activity was almost a year ago, so how can i move this forward?

Hi @pasztorl, I have updated the task, and we will implement it in PGO v2.9.0 or v2.10.0.

hors avatar Nov 29 '25 18:11 hors