charts icon indicating copy to clipboard operation
charts copied to clipboard

failed to call webhook: Post "https://cnpg-webhook-service.cnpg-system.svc:443/mutate-postgresql-cnpg-io-v1-cluster?timeout=10s":

Open raphael10-collab opened this issue 7 months ago • 5 comments

(base) raphy@raohy:~/.talos/timescaledb$ helm upgrade --install cnpg \
>   --namespace cnpg-system \
>   --create-namespace \
>   --namespace cnpg-system \
>   --create-namespace \
>   cnpg/cloudnative-pg
Release "cnpg" does not exist. Installing it now.
NAME: cnpg
LAST DEPLOYED: Fri Oct  3 19:47:37 2025
NAMESPACE: cnpg-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CloudNativePG operator should be installed in namespace "cnpg-system".
You can now create a PostgreSQL cluster with 3 nodes as follows:

cat <<EOF | kubectl apply -f -
# Example of PostgreSQL cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cluster-example

spec:
  instances: 3
  storage:
    size: 1Gi
EOF

kubectl get -A cluster

But I get this error message :

(base) raphy@raohy:~/.talos/timescaledb$ helm upgrade --install database \
>   --namespace database \
>   --create-namespace \
>   cnpg/cluster
Release "database" does not exist. Installing it now.
Error: 1 error occurred:
	* Internal error occurred: failed calling webhook "mcluster.cnpg.io": failed to call webhook: Post "https://cnpg-webhook-service.cnpg-system.svc:443/mutate-postgresql-cnpg-io-v1-cluster?timeout=10s": dial tcp 10.98.223.229:443: connect: connection refused

The same if I apply the following yaml configuration file :

(base) raphy@raohy:~/.talos/timescaledb$ nano postgres_cluster.yaml :

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgresql-cluster
spec:
  instances: 3
  storage:
    size: 1Gi

(base) raphy@raohy:~/.talos/timescaledb$ kubectl apply -f postgres_cluster.yaml 
Error from server (InternalError): error when creating "postgres_cluster.yaml": Internal error occurred: failed calling webhook "mcluster.cnpg.io": failed to call webhook: Post "https://cnpg-webhook-service.cnpg-system.svc:443/mutate-postgresql-cnpg-io-v1-cluster?timeout=10s": dial tcp 10.98.223.229:443: connect: connection refused

(base) raphy@raohy:~/.talos/timescaledb$ kubectl version
Client Version: v1.31.11
Kustomize Version: v5.4.2
Server Version: v1.31.11

OS: Talos Linux

How to make it work?

raphael10-collab avatar Oct 03 '25 17:10 raphael10-collab

Hi @raphael10-collab! I'm Dosu and I’m helping the charts team.

The webhook connection refused error is often caused by Kubernetes nodes using public IPs for internal communication instead of private IPs. To resolve this, make sure your nodes are configured to use private IPs for internal traffic. You can check your node IPs with kubectl get nodes -o wide and, if needed, reconfigure your Kubernetes agents to use the correct private IP and network interface. Also, confirm that the CloudNativePG controller deployment is fully ready before applying any cluster manifests—use kubectl -n cnpg-system get pods and kubectl -n cnpg-system rollout status deployment/cnpg-controller-manager to check pod and deployment status. If the webhook service pod isn't running or ready, investigate its logs and readiness probes. For more details, see this troubleshooting thread.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Chat with Dosu Join Discord Share on X

dosubot[bot] avatar Oct 03 '25 18:10 dosubot[bot]

@dosu

CloudNativePG controller deployment is not fully ready

the cnpg-cloudnative-pg pod goes in CrashLoopBackOff state for Startup probe failed as you can see from here:

(base) raphy@raohy:~/.talos$ kubectl -n cnpg-system describe pod cnpg-cloudnative-pg-7648fbf79f-7x4t2
Name:             cnpg-cloudnative-pg-7648fbf79f-7x4t2
Namespace:        cnpg-system
Priority:         0
Service Account:  cnpg-cloudnative-pg
Node:             k8s-eu-1-worker-2/10.0.0.5
Start Time:       Fri, 03 Oct 2025 22:17:52 +0200
Labels:           app.kubernetes.io/instance=cnpg
                  app.kubernetes.io/name=cloudnative-pg
                  pod-template-hash=7648fbf79f
Annotations:      checksum/config: c0361e36cbad50677066d4c096e50c3debed68e7a743ebd671c0a428b5565580
                  checksum/monitoring-config: 6cce6ad11601c246e0531eb45d4b8c6c327647be0a57e42375c600cd5d329739
                  checksum/rbac: 61a046ed01892794802487ddb709ba74073547b7ebbf55903efa7205703ba4af
Status:           Running
SeccompProfile:   RuntimeDefault
IP:               10.0.1.132
IPs:
  IP:           10.0.1.132
Controlled By:  ReplicaSet/cnpg-cloudnative-pg-7648fbf79f
Containers:
  manager:
    Container ID:    containerd://4429404f0a74a6653fa20388dd79c5a4f4b98da0cdb135a65a31d6c08f392b4d
    Image:           ghcr.io/cloudnative-pg/cloudnative-pg:1.27.0
    Image ID:        ghcr.io/cloudnative-pg/cloudnative-pg@sha256:9e5633b36f1f3ff0bb28b434ce51c95fbb8428a4ab47bc738ea403eb09dbf945
    Ports:           8080/TCP, 9443/TCP
    Host Ports:      0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Command:
      /manager
    Args:
      controller
      --leader-elect
      --max-concurrent-reconciles=10
      --config-map-name=cnpg-controller-manager-config
      --webhook-port=9443
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Fri, 03 Oct 2025 22:21:12 +0200
      Finished:     Fri, 03 Oct 2025 22:21:38 +0200
    Ready:          False
    Restart Count:  5
    Liveness:       http-get https://:9443/readyz delay=3s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get https://:9443/readyz delay=3s timeout=1s period=10s #success=1 #failure=3
    Startup:        http-get https://:9443/readyz delay=0s timeout=1s period=5s #success=1 #failure=6
    Environment:
      OPERATOR_IMAGE_NAME:           ghcr.io/cloudnative-pg/cloudnative-pg:1.27.0
      OPERATOR_NAMESPACE:            cnpg-system (v1:metadata.namespace)
      MONITORING_QUERIES_CONFIGMAP:  cnpg-default-monitoring
    Mounts:
      /controller from scratch-data (rw)
      /run/secrets/cnpg.io/webhook from webhook-certificates (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-w59xw (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  scratch-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  webhook-certificates:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cnpg-webhook-cert
    Optional:    true
  kube-api-access-w59xw:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  5m5s                  default-scheduler  Successfully assigned cnpg-system/cnpg-cloudnative-pg-7648fbf79f-7x4t2 to k8s-eu-1-worker-2
  Normal   Pulled     4m5s (x3 over 5m5s)   kubelet            Container image "ghcr.io/cloudnative-pg/cloudnative-pg:1.27.0" already present on machine
  Normal   Created    4m5s (x3 over 5m5s)   kubelet            Created container: manager
  Normal   Started    4m5s (x3 over 5m5s)   kubelet            Started container manager
  Normal   Killing    4m5s (x2 over 4m35s)  kubelet            Container manager failed startup probe, will be restarted
  Warning  Unhealthy  3m55s (x14 over 5m)   kubelet            Startup probe failed: Get "https://10.0.1.132:9443/readyz": dial tcp 10.0.1.132:9443: connect: connection refused

I removed (commented) from cpng-cloudnative-pg deployment the startup probe, but the problem persists.

What do you suggest me to do to keep the pod in Running state ?

raphael10-collab avatar Oct 03 '25 20:10 raphael10-collab

(base) raphy@raohy:~/.talos$ kubectl -n cnpg-system logs cnpg-cloudnative-pg-7648fbf79f-7x4t2
{"level":"info","ts":"2025-10-05T08:31:33.596855241Z","logger":"setup","msg":"Starting CloudNativePG Operator","version":"1.27.0","build":{"Version":"1.27.0","Commit":"8b442dcc3","Date":"2025-08-12"}}
{"level":"info","ts":"2025-10-05T08:31:33.597283072Z","logger":"setup","msg":"Listening for changes on all namespaces"}
{"level":"info","ts":"2025-10-05T08:31:33.599751715Z","logger":"setup","msg":"Loading configuration from ConfigMap","namespace":"cnpg-system","name":"cnpg-controller-manager-config"}
(base) raphy@raohy:~/.talos$ 
(base) raphy@raohy:~/.talos$ kubectl -n cnpg-system describe pod cnpg-cloudnative-pg-7648fbf79f-7x4t2
Name:             cnpg-cloudnative-pg-7648fbf79f-7x4t2
Namespace:        cnpg-system
Priority:         0
Service Account:  cnpg-cloudnative-pg
Node:             k8s-eu-1-worker-2/10.0.0.5
Start Time:       Fri, 03 Oct 2025 22:17:52 +0200
Labels:           app.kubernetes.io/instance=cnpg
                  app.kubernetes.io/name=cloudnative-pg
                  pod-template-hash=7648fbf79f
Annotations:      checksum/config: c0361e36cbad50677066d4c096e50c3debed68e7a743ebd671c0a428b5565580
                  checksum/monitoring-config: 6cce6ad11601c246e0531eb45d4b8c6c327647be0a57e42375c600cd5d329739
                  checksum/rbac: 61a046ed01892794802487ddb709ba74073547b7ebbf55903efa7205703ba4af
Status:           Running
SeccompProfile:   RuntimeDefault
IP:               10.0.1.132
IPs:
  IP:           10.0.1.132
Controlled By:  ReplicaSet/cnpg-cloudnative-pg-7648fbf79f
Containers:
  manager:
    Container ID:    containerd://9fca55750473fa03567ec6bac6614667615700873ac2079aa6493924ff57b07a
    Image:           ghcr.io/cloudnative-pg/cloudnative-pg:1.27.0
    Image ID:        ghcr.io/cloudnative-pg/cloudnative-pg@sha256:9e5633b36f1f3ff0bb28b434ce51c95fbb8428a4ab47bc738ea403eb09dbf945
    Ports:           8080/TCP, 9443/TCP
    Host Ports:      0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Command:
      /manager
    Args:
      controller
      --leader-elect
      --max-concurrent-reconciles=10
      --config-map-name=cnpg-controller-manager-config
      --webhook-port=9443
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Sun, 05 Oct 2025 10:31:33 +0200
      Finished:     Sun, 05 Oct 2025 10:32:03 +0200
    Ready:          False
    Restart Count:  719
    Liveness:       http-get https://:9443/readyz delay=3s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get https://:9443/readyz delay=3s timeout=1s period=10s #success=1 #failure=3
    Startup:        http-get https://:9443/readyz delay=0s timeout=1s period=5s #success=1 #failure=6
    Environment:
      OPERATOR_IMAGE_NAME:           ghcr.io/cloudnative-pg/cloudnative-pg:1.27.0
      OPERATOR_NAMESPACE:            cnpg-system (v1:metadata.namespace)
      MONITORING_QUERIES_CONFIGMAP:  cnpg-default-monitoring
    Mounts:
      /controller from scratch-data (rw)
      /run/secrets/cnpg.io/webhook from webhook-certificates (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-w59xw (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  scratch-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  webhook-certificates:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cnpg-webhook-cert
    Optional:    true
  kube-api-access-w59xw:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  Unhealthy  53m (x4214 over 36h)    kubelet  Startup probe failed: Get "https://10.0.1.132:9443/readyz": dial tcp 10.0.1.132:9443: connect: connection refused
  Normal   Pulled     23m (x713 over 36h)     kubelet  Container image "ghcr.io/cloudnative-pg/cloudnative-pg:1.27.0" already present on machine
  Warning  BackOff    3m43s (x8939 over 36h)  kubelet  Back-off restarting failed container manager in pod cnpg-cloudnative-pg-7648fbf79f-7x4t2_cnpg-system(7d3485b4-011d-474d-a8aa-c1bbbbc2bb82)
(base) raphy@raohy:~/.talos$ 

What does it mean "restarting failed container manager in pod"? How to make it work?

raphael10-collab avatar Oct 05 '25 08:10 raphael10-collab

this helped me: kubectl patch mutatingwebhookconfiguration cnpg-mutating-webhook-configuration --type='json' -p='[{"op": "replace", "path": "/webhooks/0/timeoutSeconds", "value": 30}]'

kovanond avatar Oct 09 '25 13:10 kovanond

I also have this problem

Image

have solved it with higher timeouts:

// Bundle definition for CloudNative PostgreSQL Operator deployment
bundle: {
    apiVersion: "v1alpha1"
    name:       "cnpg-deployment"
    instances: {
        // CloudNative PostgreSQL Operator Deployment using Helm Chart


        "cnpg": {
            module: {
                url:     "oci://ghcr.io/stefanprodan/modules/flux-helm-release"
                version: "latest"
            }
            namespace: "flux-system"
            values: {
                repository: {
                    url: "https://cloudnative-pg.github.io/charts"
                }
                chart: {
                    name:    "cloudnative-pg"
                    version: "*"
                }

                helmValues: {
                    webhook: {
                        enabled: true
                        mutating: {
                            create: true
                            failurePolicy: "Ignore"
                            timeoutSeconds: 30
                        }
                        validating: {
                            create: true
                            failurePolicy: "Ignore"
                            timeoutSeconds: 30
                        }
                    }
                }
                sync: targetNamespace: "cnpg-system"
            }
        }
    }
}

suse-coder avatar Oct 30 '25 18:10 suse-coder

Hi, @raphael10-collab. I'm Dosu, and I'm helping the charts team manage their backlog and am marking this issue as stale.

Issue Summary:

  • You reported a webhook connection failure due to the cnpg-cloudnative-pg pod crashing repeatedly with startup probe failures.
  • The pod was stuck in CrashLoopBackOff with connection refused errors on port 9443.
  • Suggestions included checking node IP configurations and ensuring the controller deployment readiness.
  • Increasing the webhook timeoutSeconds to 30 resolved the startup probe failures and pod crashes.
  • The issue appears to be fixed by adjusting the webhook timeout setting.

Next Steps:

  • Please confirm if this issue is still relevant with the latest version of the charts.
  • If it is, feel free to keep the discussion open by commenting; otherwise, I will automatically close this issue in 7 days.

Thanks for your understanding and contribution!

dosubot[bot] avatar Jan 29 '26 16:01 dosubot[bot]