[2.7.0] clusterdomain detection fail, so pgbackrest got wrong hostnames
Report
i see that operator try to detect the clusterdomain here. We use custom clusterdomains, what this function result is just "kubernetes" If i run a query from a postgres pod i got this:
cat /etc/resolv.conf
search example-db.svc.k8s.test1.example.com svc.k8s.test1.example.com k8s.test1.example.com
nameserver 10.15.85.18
options ndots:5
host kubernetes.default.svc
kubernetes.default.svc.k8s.test1.example.com has address 10.15.85.234
More about the problem
The result is that the pgbackrest config looks like this:
...
pg1-host = example-pg-example-pg-25pk-0.example-pg-pods.example-db.svc.kubernetes
...
This name (domain) not exists in the cluster so backup fails:
time="2025-10-13T20:02:44Z" level=info msg="[pgbackrest:stdout] 2025-10-13 20:02:44.231 P00 WARN: unable to check pg1: [HostConnectError] unable to get address for 'example-pg-example-pg-25pk-0.example-pg-example-db.svc.kubernetes': [-2] Name or service not known"
Steps to reproduce
- i created a little go application for checking using the same code:
package main
import (
"context"
"fmt"
"net"
"os"
"strings"
"time"
)
func main() {
if len(os.Args) < 2 {
fmt.Fprintf(os.Stderr, "usage: %s <nameserver-ip[:port]>\n", os.Args[0])
os.Exit(2)
}
ns := os.Args[1]
if !strings.Contains(ns, ":") {
ns += ":53"
}
// Use stdlib resolver, pointed at the provided nameserver.
resolver := &net.Resolver{
PreferGo: true,
Dial: func(ctx context.Context, _, _ string) (net.Conn, error) {
var d net.Dialer
return d.DialContext(ctx, "udp", ns)
},
}
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
api := "kubernetes.default.svc"
cname, err := resolver.LookupCNAME(ctx, api)
if err == nil {
fmt.Println(strings.TrimSuffix(strings.TrimPrefix(cname, api+"."), "."))
fmt.Println(cname,api)
return
}
fmt.Println("cluster.local")
}
- this code returns:
kdc 10.15.85.18
kubernetes
kubernetes kubernetes.default.svc
Versions
- Kubernetes 1.32.2
- Operator 2.7.0
Anything else?
I prefer adding clusterDomain as a helm value, but I also like the "auto-detection", but in this case it not worked.
Update: the problem is that the operator running inside vcluster and vcluster adds this to the podSpec:
hostAliases:
- ip: 10.15.85.234
hostnames:
- kubernetes
- kubernetes.default
- kubernetes.default.svc
so this is why this code returns kubernetes. Can you please make clusterDomain configurable, so with this i can override autodetected value? Thanks!
hello @pasztorl, thank you for this issue. We already have this in our radar, check this issue: https://perconadev.atlassian.net/browse/K8SPG-694. Please have a look.
Thanks for the link. i see that this ticket unassigned and last activity was almost a year ago, so how can i move this forward?
Thanks for the link. i see that this ticket unassigned and last activity was almost a year ago, so how can i move this forward?
Hi @pasztorl, I have updated the task, and we will implement it in PGO v2.9.0 or v2.10.0.