postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Clusters being restarted at every sync interval

Open stanyzra opened this issue 6 months ago • 2 comments

Please, answer some short questions which should help us to understand your problem / question better?

  • Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.14.0
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? OCI K8S
  • Are you running Postgres Operator in production? yes
  • Type of issue? question

It looks like the operator do a cluster syncing at every 30 minutes, but each time this happens, my clusters begins to restart, resulting in some downtimes on my production environment. I don't know if it's a bug or a misconfiguration from my manifests.

Here's my PostgreSQL production environment configuration:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: collection-pg-cluster-primary
  namespace: pg-primary
spec:
  spiloFSGroup: 103
  teamId: "collection-pg-cluster-id"
  volume:
    size: 50Gi
  numberOfInstances: 3
  env:
    - name: USE_WALG_BACKUP
      value: "true"
    - name: WAL_S3_BUCKET
      value: collection-pg-backups
    - name: AWS_REGION
      value: nyc3
    - name: AWS_ENDPOINT
      value: https://nyc3.digitaloceanspaces.com
    - name: AWS_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: pod-secret
          key: AWS_ACCESS_KEY_ID
    - name: AWS_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: pod-secret
          key: AWS_SECRET_ACCESS_KEY
  users:
    collection:
      - superuser
      - createdb
    grafana:
      - login
      - replication
    replicator:
      - replication
  databases:
    collection: collection
  enableLogicalBackup: false
  enableMasterLoadBalancer: true
  postgresql:
    version: "14"
    parameters:
      password_encryption: scram-sha-256
      log_statement: "all"
      wal_level: logical
      max_replication_slots: "80"
      max_wal_senders: "80"
  patroni:
    slots:
      collection_sub:
        database: collection
        plugin: pgoutput
        type: logical
  tls:
    secretName: pg-primary-tls
    caFile: "ca.crt"
  resources:
    requests:
      cpu: "2"
      memory: 4Gi
    limits:
      cpu: "4"
      memory: 8Gi
  sidecars:
    - name: "exporter"
      image: "quay.io/prometheuscommunity/postgres-exporter:latest"
      ports:
        - name: metrics
          containerPort: 9187
          protocol: TCP
      resources:
        limits:
          cpu: 500m
          memory: 256M
        requests:
          cpu: 100m
          memory: 200M
      env:
        - name: "DATA_SOURCE_URI"
          value: "127.0.0.1:5432?sslmode=disable"
        - name: "DATA_SOURCE_USER"
          value: "$(POSTGRES_USER)"
        - name: "DATA_SOURCE_PASS"
          value: "$(POSTGRES_PASSWORD)"
    - name: "logs-collector"
      image: "sa-saopaulo-1.ocir.io/grqi4cogacpe/collection-images/postgres-log-exporter:v0.3.8"
      resources:
        limits:
          memory: 1Gi
        requests:
          cpu: 500m
          memory: 512Mi

And the operator's logs:

time="2025-05-15T11:48:40Z" level=debug msg="syncing connection pooler (master, replica) from (false, nil) to (false, nil)" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:40Z" level=debug msg="closing database connection" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:40Z" level=debug msg="closing database connection" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:40Z" level=debug msg="syncing roles" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:40Z" level=debug msg="syncing pod disruption budgets" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:40Z" level=debug msg="making GET http request: http://10.0.11.252:8008/patroni" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:40Z" level=debug msg="making GET http request: http://10.0.11.93:8008/patroni" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:40Z" level=debug msg="making GET http request: http://10.0.10.71:8008/patroni" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:40Z" level=debug msg="making GET http request: http://10.0.11.93:8008/config" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:40Z" level=debug msg="making GET http request: http://10.0.10.71:8008/config" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:39Z" level=debug msg="syncing Patroni config" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:39Z" level=debug msg="closing database connection" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:39Z" level=debug msg="syncing statefulsets" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:39Z" level=info msg="Mount additional volumes: [{Name:pg-primary-tls MountPath:/tls SubPath: IsSubPathExpr:<nil> TargetContainers:[postgres] VolumeSource:{HostPath:nil EmptyDir:nil GCEPersistentDisk:nil AWSElasticBlockStore:nil GitRepo:nil Secret:&SecretVolumeSource{SecretName:pg-primary-tls,Items:[]KeyToPath{},DefaultMode:*416,Optional:nil,} NFS:nil ISCSI:nil Glusterfs:nil PersistentVolumeClaim:nil RBD:nil FlexVolume:nil Cinder:nil CephFS:nil Flocker:nil DownwardAPI:nil FC:nil AzureFile:nil ConfigMap:nil VsphereVolume:nil Quobyte:nil AzureDisk:nil PhotonPersistentDisk:nil Projected:nil PortworxVolume:nil ScaleIO:nil StorageOS:nil CSI:nil Ephemeral:nil}}]" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:39Z" level=debug msg="closing database connection" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:39Z" level=debug msg="syncing roles" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:39Z" level=debug msg="syncing pod disruption budgets" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:39Z" level=debug msg="making GET http request: http://10.0.10.83:8008/patroni" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:39Z" level=debug msg="syncing Patroni config" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:39Z" level=info msg="cluster version up to date. current: 140015, min desired: 140000" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:39Z" level=debug msg="closing database connection" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:39Z" level=debug msg="closing database connection" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="syncing roles" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="volume claims have been synced successfully" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:38Z" level=info msg="cluster version up to date. current: 140015, min desired: 140000" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="closing database connection" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="closing database connection" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="syncing statefulsets" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:38Z" level=info msg="Mount additional volumes: [{Name:pg-primary-read-replica-tls MountPath:/tls SubPath: IsSubPathExpr:<nil> TargetContainers:[postgres] VolumeSource:{HostPath:nil EmptyDir:nil GCEPersistentDisk:nil AWSElasticBlockStore:nil GitRepo:nil Secret:&SecretVolumeSource{SecretName:pg-primary-read-replica-tls,Items:[]KeyToPath{},DefaultMode:*416,Optional:nil,} NFS:nil ISCSI:nil Glusterfs:nil PersistentVolumeClaim:nil RBD:nil FlexVolume:nil Cinder:nil CephFS:nil Flocker:nil DownwardAPI:nil FC:nil AzureFile:nil ConfigMap:nil VsphereVolume:nil Quobyte:nil AzureDisk:nil PhotonPersistentDisk:nil Projected:nil PortworxVolume:nil ScaleIO:nil StorageOS:nil CSI:nil Ephemeral:nil}}]" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="syncing roles" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="syncing pod disruption budgets" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="making GET http request: http://10.0.11.147:8008/patroni" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="syncing Patroni config" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="volume claim for volume \"pgdata-collection-pg-cluster-primary-2\" do not require updates" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="syncing pod disruption budgets" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="making GET http request: http://10.0.11.162:8008/patroni" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="syncing Patroni config" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:38Z" level=debug msg="volume claim for volume \"pgdata-collection-pg-cluster-primary-1\" do not require updates" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:37Z" level=debug msg="syncing statefulsets" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:37Z" level=info msg="Mount additional volumes: [{Name:pg-secondary-tls MountPath:/tls SubPath: IsSubPathExpr:<nil> TargetContainers:[postgres] VolumeSource:{HostPath:nil EmptyDir:nil GCEPersistentDisk:nil AWSElasticBlockStore:nil GitRepo:nil Secret:&SecretVolumeSource{SecretName:pg-secondary-tls,Items:[]KeyToPath{},DefaultMode:*416,Optional:nil,} NFS:nil ISCSI:nil Glusterfs:nil PersistentVolumeClaim:nil RBD:nil FlexVolume:nil Cinder:nil CephFS:nil Flocker:nil DownwardAPI:nil FC:nil AzureFile:nil ConfigMap:nil VsphereVolume:nil Quobyte:nil AzureDisk:nil PhotonPersistentDisk:nil Projected:nil PortworxVolume:nil ScaleIO:nil StorageOS:nil CSI:nil Ephemeral:nil}}]" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:37Z" level=debug msg="volume claim for volume \"pgdata-collection-pg-cluster-primary-0\" do not require updates" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:37Z" level=debug msg="syncing statefulsets" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:37Z" level=debug msg="syncing volumes using \"pvc\" storage resize mode" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:37Z" level=debug msg="volume claims have been synced successfully" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:36Z" level=debug msg="syncing collection-pg-cluster-primary-failover endpoint" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:36Z" level=debug msg="volume claim for volume \"pgdata-read-replica-0\" do not require updates" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:36Z" level=debug msg="syncing collection-pg-cluster-primary-sync endpoint" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:36Z" level=debug msg="syncing volumes using \"pvc\" storage resize mode" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:36Z" level=debug msg="volume claims have been synced successfully" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:35Z" level=debug msg="syncing collection-pg-cluster-primary-config endpoint" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:35Z" level=debug msg="syncing read-replica-failover endpoint" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:35Z" level=debug msg="volume claim for volume \"pgdata-collection-pg-cluster-secondary-0\" do not require updates" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:35Z" level=debug msg="volume claims have been synced successfully" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:35Z" level=debug msg="syncing collection-pg-cluster-primary-leader endpoint" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:35Z" level=debug msg="syncing read-replica-sync endpoint" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:35Z" level=debug msg="syncing volumes using \"pvc\" storage resize mode" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:35Z" level=debug msg="volume claim for volume \"pgdata-metabase-pg-0\" do not require updates" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:35Z" level=debug msg="syncing collection-pg-cluster-primary-config service" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:35Z" level=debug msg="syncing read-replica-config endpoint" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:34Z" level=debug msg="syncing collection-pg-cluster-secondary-failover endpoint" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:34Z" level=debug msg="syncing volumes using \"pvc\" storage resize mode" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:34Z" level=debug msg="syncing read-replica-leader endpoint" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:34Z" level=debug msg="syncing collection-pg-cluster-secondary-sync endpoint" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:34Z" level=debug msg="syncing metabase-pg-failover endpoint" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:34Z" level=debug msg="syncing read-replica-config service" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:34Z" level=debug msg="syncing collection-pg-cluster-secondary-config endpoint" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:34Z" level=debug msg="syncing metabase-pg-sync endpoint" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:33Z" level=debug msg="syncing replica service" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:33Z" level=debug msg="final load balancer source ranges as seen in a service spec (not necessarily applied): [\"127.0.0.1/32\"]" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:33Z" level=debug msg="syncing collection-pg-cluster-secondary-leader endpoint" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:33Z" level=debug msg="syncing metabase-pg-config endpoint" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:33Z" level=debug msg="syncing collection-pg-cluster-secondary-config service" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:33Z" level=debug msg="syncing metabase-pg-leader endpoint" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:33Z" level=debug msg="syncing replica service" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:32Z" level=debug msg="syncing metabase-pg-config service" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:32Z" level=debug msg="syncing master service" cluster-name=pg-primary/collection-pg-cluster-primary pkg=cluster
time="2025-05-15T11:48:32Z" level=debug msg="syncing replica service" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:31Z" level=debug msg="syncing master service" cluster-name=pg-primary-read/read-replica pkg=cluster
time="2025-05-15T11:48:31Z" level=debug msg="syncing replica service" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:30Z" level=debug msg="syncing master service" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=cluster
time="2025-05-15T11:48:30Z" level=debug msg="syncing master service" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:29Z" level=info msg="syncing of the cluster started" cluster-name=pg-secondary/collection-pg-cluster-secondary pkg=controller worker=3
time="2025-05-15T11:48:29Z" level=debug msg="team API is disabled" cluster-name=metabase/metabase-pg pkg=cluster
time="2025-05-15T11:48:29Z" level=info msg="syncing of the cluster started" cluster-name=metabase/metabase-pg pkg=controller worker=0
time="2025-05-15T11:48:29Z" level=info msg="SYNC event has been queued" cluster-name=pg-primary-read/read-replica pkg=controller worker=1
time="2025-05-15T11:48:29Z" level=info msg="SYNC event has been queued" cluster-name=metabase/metabase-pg pkg=controller worker=0

stanyzra avatar May 15 '25 12:05 stanyzra