failed to call webhook: Post "https://cnpg-webhook-service.cnpg-system.svc:443/mutate-postgresql-cnpg-io-v1-cluster?timeout=10s":
(base) raphy@raohy:~/.talos/timescaledb$ helm upgrade --install cnpg \
> --namespace cnpg-system \
> --create-namespace \
> --namespace cnpg-system \
> --create-namespace \
> cnpg/cloudnative-pg
Release "cnpg" does not exist. Installing it now.
NAME: cnpg
LAST DEPLOYED: Fri Oct 3 19:47:37 2025
NAMESPACE: cnpg-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CloudNativePG operator should be installed in namespace "cnpg-system".
You can now create a PostgreSQL cluster with 3 nodes as follows:
cat <<EOF | kubectl apply -f -
# Example of PostgreSQL cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: cluster-example
spec:
instances: 3
storage:
size: 1Gi
EOF
kubectl get -A cluster
But I get this error message :
(base) raphy@raohy:~/.talos/timescaledb$ helm upgrade --install database \
> --namespace database \
> --create-namespace \
> cnpg/cluster
Release "database" does not exist. Installing it now.
Error: 1 error occurred:
* Internal error occurred: failed calling webhook "mcluster.cnpg.io": failed to call webhook: Post "https://cnpg-webhook-service.cnpg-system.svc:443/mutate-postgresql-cnpg-io-v1-cluster?timeout=10s": dial tcp 10.98.223.229:443: connect: connection refused
The same if I apply the following yaml configuration file :
(base) raphy@raohy:~/.talos/timescaledb$ nano postgres_cluster.yaml :
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgresql-cluster
spec:
instances: 3
storage:
size: 1Gi
(base) raphy@raohy:~/.talos/timescaledb$ kubectl apply -f postgres_cluster.yaml
Error from server (InternalError): error when creating "postgres_cluster.yaml": Internal error occurred: failed calling webhook "mcluster.cnpg.io": failed to call webhook: Post "https://cnpg-webhook-service.cnpg-system.svc:443/mutate-postgresql-cnpg-io-v1-cluster?timeout=10s": dial tcp 10.98.223.229:443: connect: connection refused
(base) raphy@raohy:~/.talos/timescaledb$ kubectl version
Client Version: v1.31.11
Kustomize Version: v5.4.2
Server Version: v1.31.11
OS: Talos Linux
How to make it work?
Hi @raphael10-collab! I'm Dosu and I’m helping the charts team.
The webhook connection refused error is often caused by Kubernetes nodes using public IPs for internal communication instead of private IPs. To resolve this, make sure your nodes are configured to use private IPs for internal traffic. You can check your node IPs with kubectl get nodes -o wide and, if needed, reconfigure your Kubernetes agents to use the correct private IP and network interface. Also, confirm that the CloudNativePG controller deployment is fully ready before applying any cluster manifests—use kubectl -n cnpg-system get pods and kubectl -n cnpg-system rollout status deployment/cnpg-controller-manager to check pod and deployment status. If the webhook service pod isn't running or ready, investigate its logs and readiness probes. For more details, see this troubleshooting thread.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
@dosu
CloudNativePG controller deployment is not fully ready
the cnpg-cloudnative-pg pod goes in CrashLoopBackOff state for Startup probe failed as you can see from here:
(base) raphy@raohy:~/.talos$ kubectl -n cnpg-system describe pod cnpg-cloudnative-pg-7648fbf79f-7x4t2
Name: cnpg-cloudnative-pg-7648fbf79f-7x4t2
Namespace: cnpg-system
Priority: 0
Service Account: cnpg-cloudnative-pg
Node: k8s-eu-1-worker-2/10.0.0.5
Start Time: Fri, 03 Oct 2025 22:17:52 +0200
Labels: app.kubernetes.io/instance=cnpg
app.kubernetes.io/name=cloudnative-pg
pod-template-hash=7648fbf79f
Annotations: checksum/config: c0361e36cbad50677066d4c096e50c3debed68e7a743ebd671c0a428b5565580
checksum/monitoring-config: 6cce6ad11601c246e0531eb45d4b8c6c327647be0a57e42375c600cd5d329739
checksum/rbac: 61a046ed01892794802487ddb709ba74073547b7ebbf55903efa7205703ba4af
Status: Running
SeccompProfile: RuntimeDefault
IP: 10.0.1.132
IPs:
IP: 10.0.1.132
Controlled By: ReplicaSet/cnpg-cloudnative-pg-7648fbf79f
Containers:
manager:
Container ID: containerd://4429404f0a74a6653fa20388dd79c5a4f4b98da0cdb135a65a31d6c08f392b4d
Image: ghcr.io/cloudnative-pg/cloudnative-pg:1.27.0
Image ID: ghcr.io/cloudnative-pg/cloudnative-pg@sha256:9e5633b36f1f3ff0bb28b434ce51c95fbb8428a4ab47bc738ea403eb09dbf945
Ports: 8080/TCP, 9443/TCP
Host Ports: 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Command:
/manager
Args:
controller
--leader-elect
--max-concurrent-reconciles=10
--config-map-name=cnpg-controller-manager-config
--webhook-port=9443
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 03 Oct 2025 22:21:12 +0200
Finished: Fri, 03 Oct 2025 22:21:38 +0200
Ready: False
Restart Count: 5
Liveness: http-get https://:9443/readyz delay=3s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:9443/readyz delay=3s timeout=1s period=10s #success=1 #failure=3
Startup: http-get https://:9443/readyz delay=0s timeout=1s period=5s #success=1 #failure=6
Environment:
OPERATOR_IMAGE_NAME: ghcr.io/cloudnative-pg/cloudnative-pg:1.27.0
OPERATOR_NAMESPACE: cnpg-system (v1:metadata.namespace)
MONITORING_QUERIES_CONFIGMAP: cnpg-default-monitoring
Mounts:
/controller from scratch-data (rw)
/run/secrets/cnpg.io/webhook from webhook-certificates (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-w59xw (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
scratch-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
webhook-certificates:
Type: Secret (a volume populated by a Secret)
SecretName: cnpg-webhook-cert
Optional: true
kube-api-access-w59xw:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m5s default-scheduler Successfully assigned cnpg-system/cnpg-cloudnative-pg-7648fbf79f-7x4t2 to k8s-eu-1-worker-2
Normal Pulled 4m5s (x3 over 5m5s) kubelet Container image "ghcr.io/cloudnative-pg/cloudnative-pg:1.27.0" already present on machine
Normal Created 4m5s (x3 over 5m5s) kubelet Created container: manager
Normal Started 4m5s (x3 over 5m5s) kubelet Started container manager
Normal Killing 4m5s (x2 over 4m35s) kubelet Container manager failed startup probe, will be restarted
Warning Unhealthy 3m55s (x14 over 5m) kubelet Startup probe failed: Get "https://10.0.1.132:9443/readyz": dial tcp 10.0.1.132:9443: connect: connection refused
I removed (commented) from cpng-cloudnative-pg deployment the startup probe, but the problem persists.
What do you suggest me to do to keep the pod in Running state ?
(base) raphy@raohy:~/.talos$ kubectl -n cnpg-system logs cnpg-cloudnative-pg-7648fbf79f-7x4t2
{"level":"info","ts":"2025-10-05T08:31:33.596855241Z","logger":"setup","msg":"Starting CloudNativePG Operator","version":"1.27.0","build":{"Version":"1.27.0","Commit":"8b442dcc3","Date":"2025-08-12"}}
{"level":"info","ts":"2025-10-05T08:31:33.597283072Z","logger":"setup","msg":"Listening for changes on all namespaces"}
{"level":"info","ts":"2025-10-05T08:31:33.599751715Z","logger":"setup","msg":"Loading configuration from ConfigMap","namespace":"cnpg-system","name":"cnpg-controller-manager-config"}
(base) raphy@raohy:~/.talos$
(base) raphy@raohy:~/.talos$ kubectl -n cnpg-system describe pod cnpg-cloudnative-pg-7648fbf79f-7x4t2
Name: cnpg-cloudnative-pg-7648fbf79f-7x4t2
Namespace: cnpg-system
Priority: 0
Service Account: cnpg-cloudnative-pg
Node: k8s-eu-1-worker-2/10.0.0.5
Start Time: Fri, 03 Oct 2025 22:17:52 +0200
Labels: app.kubernetes.io/instance=cnpg
app.kubernetes.io/name=cloudnative-pg
pod-template-hash=7648fbf79f
Annotations: checksum/config: c0361e36cbad50677066d4c096e50c3debed68e7a743ebd671c0a428b5565580
checksum/monitoring-config: 6cce6ad11601c246e0531eb45d4b8c6c327647be0a57e42375c600cd5d329739
checksum/rbac: 61a046ed01892794802487ddb709ba74073547b7ebbf55903efa7205703ba4af
Status: Running
SeccompProfile: RuntimeDefault
IP: 10.0.1.132
IPs:
IP: 10.0.1.132
Controlled By: ReplicaSet/cnpg-cloudnative-pg-7648fbf79f
Containers:
manager:
Container ID: containerd://9fca55750473fa03567ec6bac6614667615700873ac2079aa6493924ff57b07a
Image: ghcr.io/cloudnative-pg/cloudnative-pg:1.27.0
Image ID: ghcr.io/cloudnative-pg/cloudnative-pg@sha256:9e5633b36f1f3ff0bb28b434ce51c95fbb8428a4ab47bc738ea403eb09dbf945
Ports: 8080/TCP, 9443/TCP
Host Ports: 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Command:
/manager
Args:
controller
--leader-elect
--max-concurrent-reconciles=10
--config-map-name=cnpg-controller-manager-config
--webhook-port=9443
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Sun, 05 Oct 2025 10:31:33 +0200
Finished: Sun, 05 Oct 2025 10:32:03 +0200
Ready: False
Restart Count: 719
Liveness: http-get https://:9443/readyz delay=3s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:9443/readyz delay=3s timeout=1s period=10s #success=1 #failure=3
Startup: http-get https://:9443/readyz delay=0s timeout=1s period=5s #success=1 #failure=6
Environment:
OPERATOR_IMAGE_NAME: ghcr.io/cloudnative-pg/cloudnative-pg:1.27.0
OPERATOR_NAMESPACE: cnpg-system (v1:metadata.namespace)
MONITORING_QUERIES_CONFIGMAP: cnpg-default-monitoring
Mounts:
/controller from scratch-data (rw)
/run/secrets/cnpg.io/webhook from webhook-certificates (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-w59xw (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
scratch-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
webhook-certificates:
Type: Secret (a volume populated by a Secret)
SecretName: cnpg-webhook-cert
Optional: true
kube-api-access-w59xw:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 53m (x4214 over 36h) kubelet Startup probe failed: Get "https://10.0.1.132:9443/readyz": dial tcp 10.0.1.132:9443: connect: connection refused
Normal Pulled 23m (x713 over 36h) kubelet Container image "ghcr.io/cloudnative-pg/cloudnative-pg:1.27.0" already present on machine
Warning BackOff 3m43s (x8939 over 36h) kubelet Back-off restarting failed container manager in pod cnpg-cloudnative-pg-7648fbf79f-7x4t2_cnpg-system(7d3485b4-011d-474d-a8aa-c1bbbbc2bb82)
(base) raphy@raohy:~/.talos$
What does it mean "restarting failed container manager in pod"? How to make it work?
this helped me: kubectl patch mutatingwebhookconfiguration cnpg-mutating-webhook-configuration --type='json' -p='[{"op": "replace", "path": "/webhooks/0/timeoutSeconds", "value": 30}]'
I also have this problem
have solved it with higher timeouts:
// Bundle definition for CloudNative PostgreSQL Operator deployment
bundle: {
apiVersion: "v1alpha1"
name: "cnpg-deployment"
instances: {
// CloudNative PostgreSQL Operator Deployment using Helm Chart
"cnpg": {
module: {
url: "oci://ghcr.io/stefanprodan/modules/flux-helm-release"
version: "latest"
}
namespace: "flux-system"
values: {
repository: {
url: "https://cloudnative-pg.github.io/charts"
}
chart: {
name: "cloudnative-pg"
version: "*"
}
helmValues: {
webhook: {
enabled: true
mutating: {
create: true
failurePolicy: "Ignore"
timeoutSeconds: 30
}
validating: {
create: true
failurePolicy: "Ignore"
timeoutSeconds: 30
}
}
}
sync: targetNamespace: "cnpg-system"
}
}
}
}
Hi, @raphael10-collab. I'm Dosu, and I'm helping the charts team manage their backlog and am marking this issue as stale.
Issue Summary:
- You reported a webhook connection failure due to the cnpg-cloudnative-pg pod crashing repeatedly with startup probe failures.
- The pod was stuck in CrashLoopBackOff with connection refused errors on port 9443.
- Suggestions included checking node IP configurations and ensuring the controller deployment readiness.
- Increasing the webhook
timeoutSecondsto 30 resolved the startup probe failures and pod crashes. - The issue appears to be fixed by adjusting the webhook timeout setting.
Next Steps:
- Please confirm if this issue is still relevant with the latest version of the charts.
- If it is, feel free to keep the discussion open by commenting; otherwise, I will automatically close this issue in 7 days.
Thanks for your understanding and contribution!