postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

wal-g when setting WALG_AZ_PREFIX only substitutes $(SCOPE) and $(PGVERSION)

Open juwalter opened this issue 2 years ago • 7 comments

Please, answer some short questions which should help us to understand your problem / question better?

  • Which image of the operator are you using? --> registry.opensource.zalan.do/acid/postgres-operator:v1.8.2
  • **Where do you run it - cloud or metal? --> Bare Metal K8s
  • Are you running Postgres Operator in production? --> yes
  • Type of issue? --> question

Hello,

when following the setup according to https://github.com/zalando/postgres-operator/blob/master/docs/administrator.md#azure-setup

apiVersion: v1
kind: ConfigMap
metadata:
  name: pod-env-overrides
  namespace: postgres-operator-system
data:
  # Any env variable used by spilo can be added
  USE_WALG_BACKUP: "true"
  USE_WALG_RESTORE: "true"
  CLONE_USE_WALG_RESTORE: "true"
  WALG_AZ_PREFIX: "azure://container-name/$(SCOPE)/$(PGVERSION)" # Enables Azure Backups (SCOPE = Cluster name) (PGVERSION = Postgres version)

this all really works fine thus far. However :) when we were using S3 previously, we got a better "path" name for the storage location of the WAL segments, like so:

  • template WALE_S3_PREFIX=$WAL_S3_BUCKET/spilo/{WAL_BUCKET_SCOPE_PREFIX}{SCOPE}{WAL_BUCKET_SCOPE_SUFFIX}/wal/{PGVERSION}
  • example: s3://postgresql/spilo/mycluster-postgres/6e6599d0-81e1-4a63-8b28-f96f59160096/wal/13

IMHO, this is a much better storage location as it does allow to simply identify from what particular cluster those WAL segments came from. I think it would also allow to have clusters with the same name in different k8s namespaces. Therefore, I set out to try to replicate this pattern using Azure blob, but no dice.

I found this WALE_S3_PREFIX seems to actually be created by spilo at https://github.com/zalando/spilo/blob/a86778bd601c4f6de98db9d207a8c1e6af31c984/postgres-appliance/scripts/configure_spilo.py#L892

    prefix_env_name = write_envdir_names[0]
    store_type = prefix_env_name[5:].split('_')[0]
    if not wale.get(prefix_env_name):  # WALE_*_PREFIX is not defined in the environment
        bucket_path = '/spilo/{WAL_BUCKET_SCOPE_PREFIX}{SCOPE}{WAL_BUCKET_SCOPE_SUFFIX}/wal/{PGVERSION}'.format(**wale)
        prefix_template = '{0}://{{WAL_{1}_BUCKET}}{2}'.format(store_type.lower(), store_type, bucket_path)
        wale[prefix_env_name] = prefix_template.format(**wale)
    # Set WALG_*_PREFIX for future compatibility
    if store_type in ('S3', 'GS') and not wale.get(write_envdir_names[1]):
        wale[write_envdir_names[1]] = wale[prefix_env_name]

What I did not find was how / where WALG_AZ_PREFIX: "azure://container-name/$(SCOPE)/$(PGVERSION)" is actually been interpolated. It does not appear to happen in spilo, nor in wal-g; that pretty much leaves the operator?

  • q1: can someone point me to where WALG_AZ_PREFIX is interpolated, so I can see if there is a chance to include e.g. $(WAL_BUCKET_SCOPE_SUFFIX)
  • q2: is it even a good idea to include the cluster uid (WAL_BUCKET_SCOPE_SUFFIX) in this path in case of Azure? both, s3 and gs talk about buckets, but I think this is not a thing in Azure blob

Many thanks in advance!

juwalter avatar Feb 17 '23 14:02 juwalter

Hello, if I understand what you are after, I also had the need to include the cluster id in our backup paths (under azure); ( I wanted to make sure that if I needed to restore it the backups would go to a new location and have the backups "pristine" so I could attempt multiple restorations if need be).

I was able to accomplish this by including an initContainers section on the cluster that would query k8s for the postgresql cluster custom resource and store the uid in a configmap; then in the env section of the postgresql cluster I added a variable referencing the configmap / cluster id, finally, I overrode the WALG_AZ_PREFIX and referenced that newly added env var to get it into the path. I hope all this makes sense. (see below for more detail)

I was able to get this working by:

  • Adding some additional resource/verbs to the "postgres-pod" cluster role:
            - apiGroups:
             - acid.zalan.do
             resources:
             - postgresqls
             verbs:
             - get
           - apiGroups: [""]
             resources: ["configmaps"]
             verbs:
             - get
             - create
             - update
             - patch
  • Add an initContainers section to my postgresql cluster ( this initcontainer queries k8s to get your postgreql cr and retrieve / store the cluster's uid in a config map)
    initContainers:
    - args:
      - |
        pgc_id="$(kubectl get postgresql your-cluster-name -o jsonpath='{.metadata.uid}')";
        echo "{\"apiVersion\": \"v1\",\"kind\": \"ConfigMap\",\"metadata\": {\"name\": \"your-config-map-name\"}, \"data\": {\"pgc-id\": \"$pgc_id\"}}" > configmap.json;
        kubectl apply -f configmap.json;
      command:
      - /bin/sh
      - -c
      image: lachlanevenson/k8s-kubectl
      name: set-cluster-id-env

-Finally, I added an environment variable that is referenced as a dependent env var in the WALG_AZ_PREFIX (which I also include in the postgresql cluster to override it from the operator config):

    env:
      - name: PGC_ID
        valueFrom:
          configMapKeyRef:
            name: your-config-map-name
            key: pgc-id
      - name: WALG_AZ_PREFIX
        value: azure://your-blob-name/$(PGC_ID)/wal/$(PGVERSION)

mightymiracleman avatar Mar 11 '24 15:03 mightymiracleman

I was able to overcome this using a solution similar to @mightymiracleman

In the Postgresql manifest

  env:
  - name: PG_UID
    valueFrom:
      fieldRef:
        fieldPath: metadata.uid

In the operator Pod config

WALG_AZ_PREFIX: azure://backups/spilo/$(SCOPE)/$(PG_UID)/wal

youvegotmoxie avatar May 22 '24 18:05 youvegotmoxie

I was able to overcome this using a solution similar to @mightymiracleman

In the Postgresql manifest

  env:
  - name: PG_UID
    valueFrom:
      fieldRef:
        fieldPath: metadata.uid

In the operator Pod config

WALG_AZ_PREFIX: azure://backups/spilo/$(SCOPE)/$(PG_UID)/wal

I did give this a try as it is a lot cleaner than the way I was doing it; but I think this is pulling the uid from the POD and not the postgresql cluster. I added it to one of my clusters and was surprised to see different values for the 2 env vars I had; the first one was my postgresql cluster uid, the second one I tracked down to the pod that the spilo container was running in; I also bounced the pod and noticed the value changed (and matched up with the pod) did I miss something on that?

mightymiracleman avatar May 22 '24 19:05 mightymiracleman

You are correct. I observed the same after a bit more testing. DownwardAPI doesn't support getting the UID from the StatefulSet or the Postgresql objects so I think your solution is the only way this will work.

youvegotmoxie avatar May 22 '24 20:05 youvegotmoxie

No worries @youvegotmoxie ! I would have loved to do it the way you posted; it was a heck of a lot cleaner.

mightymiracleman avatar May 22 '24 20:05 mightymiracleman