unable to create local volume repo after cluster running on s3 repo
Overview
We're trying to create a local volume in addition to the s3 repo, but we keep getting error. We deployed pg cluster originally with just s3 repo but added local volume afterwards.
Environment
- Platform: EKS
- Platform Version: 1.21
- PGO Image Tag: crunchy-postgres:centos8-14.1-0
- Postgres Version : 14
- Storage: gp3
Steps to Reproduce
REPRO
Provide steps to get to the error condition: cluster spec:
---
# Source: pg/templates/postgrescluster.yml
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: pg
spec:
patroni:
dynamicConfiguration:
postgresql:
parameters:
checkpoint_completion_target: 0.7
default_statistics_target: 100
effective_cache_size: 12GB
effective_io_concurrency: 200
maintenance_work_mem: 1GB
max_connections: 3000
max_parallel_maintenance_workers: 4
max_parallel_workers: 8
max_parallel_workers_per_gather: 4
max_wal_size: 4GB
max_worker_processes: 8
min_wal_size: 1GB
random_page_cost: 1.1
shared_buffers: 4GB
wal_buffers: 16MB
work_mem: 349kB
users:
- name: postgres
monitoring:
pgmonitor:
exporter:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.0.4-0
proxy:
pgBouncer:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:centos8-1.16-0
config:
global:
default_pool_size: "100"
max_client_conn: "10000"
pool_mode: transaction
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-use-pg
operator: In
values:
- "true"
- matchExpressions:
- key: node-use-postgres
operator: In
values:
- "true"
tolerations:
- effect: NoSchedule
key: node-use-pg
operator: Equal
value: "true"
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.1-0
postgresVersion: 14
instances:
- name: pg
replicas: 2
resources:
requests:
cpu: 2
memory: 2Gi
limits:
cpu: 4
memory: 4Gi
dataVolumeClaimSpec:
storageClassName: gp3
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 50Gi
tolerations:
- effect: NoSchedule
key: node-use-pg
operator: Equal
value: "true"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-use-pg
operator: In
values:
- "true"
- matchExpressions:
- key: node-use-postgres
operator: In
values:
- "true"
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/cluster: pg
postgres-operator.crunchydata.com/instance-set: pg
backups:
pgbackrest:
configuration:
- secret:
name: pgo-s3-creds
manual:
repoName: repo1
options:
- --type=full
global:
repo1-path: /stg-kcmh-a-1/repo1
repo1-retention-full: "14"
repo1-retention-full-type: time
repos:
- name: repo1
schedules:
full: 0 1 * * *
s3:
bucket: pg-bucket-name
endpoint: s3.amazonaws.com
region: us-east-2
EXPECTED
Expected local volume repo to work correctly.
ACTUAL
It creates the pod named "pg-repo-host-0" and upon checking logs on the pg-primary pod, we get the following error
tail -f /pgdata/pg14/log/postgresql-Thu.log
repo2: [FileMissingError] unable to load info file '/stg-kcmh-a-1/repo1/archive/db/archive.info' or '/stg-kcmh-a-1/repo1/archive/db/archive.info.copy':
FileMissingError: raised from remote-0 ssh protocol on 'pg-repo-host-0.pg-pods.pg.svc.cluster.local.': unable to open missing file '/stg-kcmh-a-1/repo1/archive/db/archive.info' for read
FileMissingError: raised from remote-0 ssh protocol on 'pg-repo-host-0.pg-pods.pg.svc.cluster.local.': unable to open missing file '/stg-kcmh-a-1/repo1/archive/db/archive.info.copy' for read
HINT: archive.info cannot be opened but is required to push/get WAL segments.
HINT: is archive_command configured correctly in postgresql.conf?
HINT: has a stanza-create been performed?
HINT: use --no-archive-check to disable archive checks during backup if you have an alternate archiving scheme.
2022-03-31 22:32:53.485 UTC [210] LOG: archive command failed with exit code 104
2022-03-31 22:32:53.485 UTC [210] DETAIL: The failed archive command was: pgbackrest --stanza=db archive-push "pg_wal/00000002.history"
2022-03-31 22:32:53.485 UTC [210] WARNING: archiving write-ahead log file "00000002.history" failed too many times, will try again later
Hello @nawarajshahi Were you able to resolve the issue? I have the same problem and cannot progress.
Thanks & Regards
Spam marking was test on GitHub's "Hide" functionality. Ignore.
Hello @nawarajshahi and @tirelibirefe,
We were able to reproduce the output you are seeing; however, it does not seem to be affecting the functionality of that specific repo or pgBackRest in general.
Are you having trouble creating backups on your repos or with any other pgBackRest functionality? If so, can you provide more detailed steps on how you are producing this issue? What does your postgrescluster status look like (kubectl -n <your namespace> get postgrescluster <cluster name> -o yaml)?
We are not seeing any functional impact or issue per @dsessler7's response above. More specifically, PGO appears to be reconciling repo changes as designed.
Therefore, proceeding with closing.