postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

CronJobs not created with cluster name & repo name combination greater than 41 characters

Open szelenka opened this issue 3 years ago • 1 comments

Please ensure you do the following when reporting a bug:

  • [x] Provide a concise description of what the bug is.
  • [x] Provide information about your environment.
  • [x] Provide clear steps to reproduce the bug.
  • [x] Attach applicable logs. Please do not attach screenshots showing logs unless you are unable to copy and paste the log data.
  • [x] Ensure any code / output examples are properly formatted for legibility.

Overview

Long cluster names will result in silent downstream failures for the PGO. The documentation suggests that you should have a name 46 characters or less for the instance but no limitations on the pgBackRest name:

  • https://access.crunchydata.com/documentation/postgres-operator/latest/references/crd/#postgresclusterspecinstancesindex
  • https://access.crunchydata.com/documentation/postgres-operator/latest/references/crd/#postgresclusterspecbackupspgbackrestreposindex

If you opt to have CronJobs running, those must be less-than 41 characters to meet the 52 character limit of CronJobs. But it's actually more complicated than that, since you can name your repo whatever you want, you'd need to subtract the length of the repo name, along with the backup type, to learn what your new limitation of cluster name actually is.

Environment

Please provide the following details:

  • Platform: Kubernetes
  • Platform Version: 5.2.0
  • PGO Image Tag: ubi8-5.2.0-0
  • Postgres Version 14
  • Storage: pvc

Steps to Reproduce

REPRO

  1. create a PostgresCluster with name containing 42 characters:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
  name: xxxxxx-xxxxxxxxxx-xxxxxxxxxx-xxxx-xxxxxxxx
  namespace: xxxxxxxxxx-xxxxxxxxxx
spec:
  backups:
    pgbackrest:
      global:
        compress-level: "3"
        repo1-path: /pgbackrest/xxxxxxxxxx-xxxxxxxxxx/xxxxxxxxxx-xxxxxxxxxx-xxxx/repo1
        repo1-retention-full: "2"
        repo1-s3-uri-style: path
      repos:
      - name: repo1
        s3:
          bucket: xxxxxxxxxx-xxxxxxxxxx
          endpoint: s3.us-west-2.amazonaws.com
          region: us-west-2
        schedules:
          differential: 15 8 * * 1-6
          full: 15 8 * * 0
          incremental: 5,35 * * * *
  instances:
    dataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 50Gi
      storageClassName: standard
    name: i1
    replicas: 2
  1. Observe that none of the CronJobs are created for this cluster, but the cluster is otherwise operational

EXPECTED

  1. Would expect the PGO truncate the cluster-name (when possible) to satisfy the length requirement when creating a CronJob or STS based off the values populated in the CRD.

ACTUAL

  1. No CronJob is created when the combined length violates the k8s spec for name length.

Logs

time="2022-11-07T17:31:53Z" level=error msg="error when attempting to create pgBackRest CronJob" error="CronJob.batch \"xxxxxx-xxxxxxxxxx-xxxxxxxxxx-xxxx-xxxxxxxx-repo1-incr\" is invalid: metadata.name: Invalid value: \"xxxxxx-xxxxxxxxxx-xxxxxxxxxx-xxxx-xxxxxxxx-repo1-incr\": must be no more than 52 characters" file="internal/controller/postgrescluster/pgbackrest.go:2916" func="postgrescluster.(*Reconciler).reconcilePGBackRestCronJob" name=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx namespace=xxxxx reconcileResource=repoCronJob reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.2.0 

Additional Information

The addition of -repo1-incr, -repo1-diff, or -repo1-full would suggest the cluster name length must be 41 characters or less, not 46 as documented. However, it seems that the user can specify the repo1 portion of this, so it's impossible to document a static value: https://github.com/CrunchyData/postgres-operator/blob/master/internal/naming/names.go#L427-L433

Outside of having logic to manually truncate the CronJob name, perhaps the documentation just needs to be updated, similar to how it was for STS in #3170 ?

szelenka avatar Nov 07 '22 17:11 szelenka

Hello @szelenka,

First off, the repo name actually must be repoN, where N is 1, 2, 3, or 4, so that length is fixed.

Secondly, the character limit of 46 that you have pointed to in the documentation is the limit for actually running a postgrescluster. Meaning, if the length of the cluster name plus the length of the instance name is greater than 46 then the Pods won't start.

That all being said, I will grant you that the documented length limit of 46 characters is a little misleading, especially in the context of using CronJobs.

I will add a story to our backlog to update the documentation to provide more clarity on this.

dsessler7 avatar Jul 26 '23 23:07 dsessler7