postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Clarification on operator behaviour of backup to and cloning from S3

Open krauthex opened this issue 10 months ago • 1 comments

Please, answer some short questions which should help us to understand your problem / question better?

  • Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.14.0
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? Rancher RKE2 k8S cluster (on OpenNebula VMs)
  • Are you running Postgres Operator in production? yes
  • Type of issue? question

My question is in regards to the behaviour of the postgres operator when backup/restore to/from S3-compatible object storage is configured. We have an S3 compatible storage on premise (purestorage), and backup + restore from there works generally. The necessary configuration for us to make it work is this:

# values.yaml for postgres-operator install
...
configKubernetes:
  pod_environment_configmap: "postgres-operator/postgres-pod-config"

configAwsOrGcp:
  wal_s3_bucket: <my backup bucket>

and

# ConfigMap 
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-pod-config
  namespace: postgres-operator
data:
  # Any env variable used by spilo can be added
  AWS_ENDPOINT: "http://11.22.33.44"
  AWS_SECRET_ACCESS_KEY: "<secret access key>"
  AWS_ACCESS_KEY_ID: "<access key id>"
  AWS_REGION: "eu-north-1"
  AWS_S3_FORCE_PATH_STYLE: "true"
  BACKUP_SCHEDULE: '0 5 * * *'
  BACKUP_NUM_TO_RETAIN: "10"
  WAL_S3_BUCKET: <my backup bucket> 
  WAL_BUCKET_SCOPE_PREFIX: ""
  USE_WALG_BACKUP: "true"
  USE_WALG_RESTORE: "true"
  WALG_DISABLE_S3_SSE: "true"
  WALE_DISABLE_S3_SSE: "true"
  ## below is the config needed for cloning, which is also necessary for restoring backups
  CLONE_USE_WALG_RESTORE: "true"
  CLONE_AWS_SECRET_ACCESS_KEY: "<secret access key>"
  CLONE_AWS_ACCESS_KEY_ID: "<access key id>"
  CLONE_AWS_REGION: "eu-north-1"
  CLONE_AWS_S3_FORCE_PATH_STYLE: "true"
  CLONE_AWS_ENDPOINT: "http://11.22.33.44"
  CLONE_METHOD: CLONE_WITH_WALE
  CLONE_WAL_BUCKET_SCOPE_PREFIX: ""
  CLONE_WAL_S3_BUCKET: <my backup bucket>

I have two questions:

  1. With this, we get the expected/desired paths in S3 for the backups, e.g. s3://<my backup bucket>/spilo/<my postgres>/<my postgres UID>/wal/15/.... However, we needed to add the configAwsOrGcp:wal_s3_bucket: <my backup bucket> lines to the operator values.yaml, because otherwise the UID of the postgres cluster was ignored in the backup path and having multiple postgres clusters with the same name would cause them to write into the same directory. This fix was from this comment, where a user read the code to figure out the behaviour. Is this behaviour intentional? and if so, could you please document it, because this is nowhere to be found in the documentation.
  2. When cloning from a backup in S3 with the above configuration, we get the warning WARNING - Cloning with WAL-E is only possible when CLONE_WALE_*_PREFIX or CLONE_WALG_*_PREFIX or CLONE_WAL_*_BUCKET and CLONE_SCOPE are set. The CLONE_WAL_S3_BUCKET variable is set, but what is CLONE_SCOPE? It's nowhere in the documentation (at least I could not find it). I thought the CLONE_SCOPE might be the cluster: "my postgres" in the clone section of the clone postgres cluster manifest, but this is always present. What would be the configuration/ what is wrong here in our configuration?

I'd appreciate an answer, thanks!

krauthex avatar Jan 29 '25 08:01 krauthex