postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

unable to create local volume repo after cluster running on s3 repo

Open nawarajshahi opened this issue 3 years ago • 1 comments

Overview

We're trying to create a local volume in addition to the s3 repo, but we keep getting error. We deployed pg cluster originally with just s3 repo but added local volume afterwards.

Environment

  • Platform: EKS
  • Platform Version: 1.21
  • PGO Image Tag: crunchy-postgres:centos8-14.1-0
  • Postgres Version : 14
  • Storage: gp3

Steps to Reproduce

REPRO

Provide steps to get to the error condition: cluster spec:

---
# Source: pg/templates/postgrescluster.yml
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: pg
spec:      
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          checkpoint_completion_target: 0.7
          default_statistics_target: 100
          effective_cache_size: 12GB
          effective_io_concurrency: 200
          maintenance_work_mem: 1GB
          max_connections: 3000
          max_parallel_maintenance_workers: 4
          max_parallel_workers: 8
          max_parallel_workers_per_gather: 4
          max_wal_size: 4GB
          max_worker_processes: 8
          min_wal_size: 1GB
          random_page_cost: 1.1
          shared_buffers: 4GB
          wal_buffers: 16MB
          work_mem: 349kB
  users:
  - name: postgres
  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.0.4-0        
  proxy:
    pgBouncer:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:centos8-1.16-0
      config:
        global:
          default_pool_size: "100"
          max_client_conn: "10000"
          pool_mode: transaction
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-use-pg
                operator: In
                values:
                - "true"
            - matchExpressions:                
              - key: node-use-postgres
                operator: In
                values:
                - "true"
      tolerations:
        - effect: NoSchedule
          key: node-use-pg
          operator: Equal
          value: "true"      
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.1-0 
  postgresVersion: 14
  instances:
    - name: pg
      replicas: 2
      resources:
        requests:
          cpu: 2
          memory: 2Gi
        limits:
          cpu: 4
          memory: 4Gi
      dataVolumeClaimSpec:
        storageClassName: gp3
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 50Gi
      tolerations:
        - effect: NoSchedule
          key: node-use-pg
          operator: Equal
          value: "true"      
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-use-pg
                operator: In
                values:
                - "true"
            - matchExpressions:                
              - key: node-use-postgres
                operator: In
                values:
                - "true"                
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: pg
                  postgres-operator.crunchydata.com/instance-set: pg
  backups:
    pgbackrest: 
      configuration:
      - secret:
          name: pgo-s3-creds
      manual:
        repoName: repo1
        options:
          - --type=full          
      global:
        repo1-path: /stg-kcmh-a-1/repo1
        repo1-retention-full: "14"
        repo1-retention-full-type: time
      repos:
      - name: repo1
        schedules:
          full: 0 1 * * *
        s3:
          bucket: pg-bucket-name
          endpoint: s3.amazonaws.com
          region: us-east-2

EXPECTED

Expected local volume repo to work correctly.

ACTUAL

It creates the pod named "pg-repo-host-0" and upon checking logs on the pg-primary pod, we get the following error

tail -f /pgdata/pg14/log/postgresql-Thu.log 
       repo2: [FileMissingError] unable to load info file '/stg-kcmh-a-1/repo1/archive/db/archive.info' or '/stg-kcmh-a-1/repo1/archive/db/archive.info.copy':
       FileMissingError: raised from remote-0 ssh protocol on 'pg-repo-host-0.pg-pods.pg.svc.cluster.local.': unable to open missing file '/stg-kcmh-a-1/repo1/archive/db/archive.info' for read
       FileMissingError: raised from remote-0 ssh protocol on 'pg-repo-host-0.pg-pods.pg.svc.cluster.local.': unable to open missing file '/stg-kcmh-a-1/repo1/archive/db/archive.info.copy' for read
       HINT: archive.info cannot be opened but is required to push/get WAL segments.
       HINT: is archive_command configured correctly in postgresql.conf?
       HINT: has a stanza-create been performed?
       HINT: use --no-archive-check to disable archive checks during backup if you have an alternate archiving scheme.
2022-03-31 22:32:53.485 UTC [210] LOG:  archive command failed with exit code 104
2022-03-31 22:32:53.485 UTC [210] DETAIL:  The failed archive command was: pgbackrest --stanza=db archive-push "pg_wal/00000002.history"
2022-03-31 22:32:53.485 UTC [210] WARNING:  archiving write-ahead log file "00000002.history" failed too many times, will try again later

nawarajshahi avatar Mar 31 '22 22:03 nawarajshahi

Hello @nawarajshahi Were you able to resolve the issue? I have the same problem and cannot progress.

Thanks & Regards

tirelibirefe avatar Sep 08 '22 14:09 tirelibirefe

Spam marking was test on GitHub's "Hide" functionality. Ignore.

dsessler7 avatar Oct 03 '22 22:10 dsessler7

Hello @nawarajshahi and @tirelibirefe,

We were able to reproduce the output you are seeing; however, it does not seem to be affecting the functionality of that specific repo or pgBackRest in general.

Are you having trouble creating backups on your repos or with any other pgBackRest functionality? If so, can you provide more detailed steps on how you are producing this issue? What does your postgrescluster status look like (kubectl -n <your namespace> get postgrescluster <cluster name> -o yaml)?

dsessler7 avatar Oct 06 '22 19:10 dsessler7

We are not seeing any functional impact or issue per @dsessler7's response above. More specifically, PGO appears to be reconciling repo changes as designed.

Therefore, proceeding with closing.

andrewlecuyer avatar Oct 25 '22 14:10 andrewlecuyer