postgres-operator-examples icon indicating copy to clipboard operation
postgres-operator-examples copied to clipboard

configmap hippo-ssh-config is missing

Open ZuSe opened this issue 3 years ago • 5 comments

I am not sure if this is a bug, but I just upgraded from 5.0.2 to 5.0.3 using helm upgrade command. However, the first pod is trying to be rescheduled but fails with some missing config maps. I didn't change anything on my config so far

pe     Reason       Age               From               Message
  ----     ------       ----              ----               -------
  Normal   Scheduled    40s               default-scheduler  Successfully assigned postgresql/hippo-instance1-xqp6-0 to k8s-production-fr-standard-node-b4452d
  Warning  FailedMount  8s (x7 over 40s)  kubelet            MountVolume.SetUp failed for volume "ssh" : [configmap "hippo-ssh-config" not found, secret "hippo-ssh" not found]
  Warning  FailedMount  8s (x7 over 40s)  kubelet            MountVolume.SetUp failed for volume "pgbackrest-config" : configmap references non-existent config key: pgbackrest_instance.conf

Not sure is this is an undetected issue in the upgrade path.

ZuSe avatar Nov 03 '21 16:11 ZuSe

@ZuSe can you provide your PostgresCluster spec?

I am specifically curious about your pgBackRest repo configuration.

Thanks!

andrewlecuyer avatar Nov 03 '21 18:11 andrewlecuyer

Hi @andrewlecuyer, sure. See below pls. As I said, I didn't touch anything except for pgbouncer service type.

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  annotations:
    meta.helm.sh/release-name: hippo
    meta.helm.sh/release-namespace: postgresql
  creationTimestamp: "2021-09-07T14:26:35Z"
  finalizers:
  - postgres-operator.crunchydata.com/finalizer
  generation: 14
  labels:
    app.kubernetes.io/managed-by: Helm
  name: hippo
  namespace: postgresql
  resourceVersion: "24742400499"
  uid: d69be417-aecf-42e8-aafb-04dc85f89bb8
spec:
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 100Gi
            storageClassName: csi-cinder-classic
      - name: repo2
        volume:
          volumeClaimSpec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 100Gi
            storageClassName: csi-cinder-classic
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.4-1
  instances:
  - dataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 50Gi
      storageClassName: csi-cinder-high-speed
    name: instance1
    replicas: 2
    resources:
      limits:
        cpu: 2
        memory: 4Gi
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          max_parallel_workers: 2
          max_worker_processes: 2
          shared_buffers: 1GB
          work_mem: 32MB
    leaderLeaseDurationSeconds: 30
    port: 8008
    syncPeriodSeconds: 10
  port: 5432
  postgresVersion: 13
  proxy:
    pgBouncer:
      config:
        global:
          ignore_startup_parameters: extra_float_digits,ssl_renegotiation_limit
          pool_mode: transaction
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:centos8-1.15-3
      port: 5432
      replicas: 1
      resources:
        limits:
          cpu: 200m
          memory: 256Mi
      service:
        type: LoadBalancer
  users:
  - name: postgres
  - databases:
    - account_service_production
    - content_service_production
    - fhir_r5_production
    - fileserver_production
    - matomo_production
    - poll_service_production
    - reff_service_production
    - utility_service_production
    name: iatros
    options: CREATEDB CREATEROLE
status:
  conditions:
  - lastTransitionTime: "2021-10-05T20:28:30Z"
    message: pgBackRest dedicated repository host is ready
    observedGeneration: 14
    reason: RepoHostReady
    status: "True"
    type: PGBackRestRepoHostReady
  - lastTransitionTime: "2021-09-07T14:28:03Z"
    message: pgBackRest replica create repo is ready for backups
    observedGeneration: 12
    reason: StanzaCreated
    status: "True"
    type: PGBackRestReplicaRepoReady
  - lastTransitionTime: "2021-09-07T14:28:52Z"
    message: pgBackRest replica creation is now possible
    observedGeneration: 12
    reason: RepoBackupComplete
    status: "True"
    type: PGBackRestReplicaCreate
  - lastTransitionTime: "2021-10-05T20:28:04Z"
    message: Deployment has minimum availability.
    observedGeneration: 14
    reason: MinimumReplicasAvailable
    status: "True"
    type: ProxyAvailable
  databaseRevision: 685ff8ffb8
  instances:
  - name: instance1
    readyReplicas: 1
    replicas: 2
    updatedReplicas: 1
  monitoring:
    exporterConfiguration: 559c4c97d6
  observedGeneration: 14
  patroni:
    systemIdentifier: "7005198412650696782"
  pgbackrest:
    repoHost:
      apiVersion: apps/v1
      kind: StatefulSet
      ready: true
    repos:
    - bound: true
      name: repo1
      replicaCreateBackupComplete: true
      stanzaCreated: true
    - bound: true
      name: repo2
      stanzaCreated: true
  proxy:
    pgBouncer:
      postgresRevision: 694b7b5f67
      readyReplicas: 1
      replicas: 1
  usersRevision: 67886fd468

ZuSe avatar Nov 03 '21 18:11 ZuSe

Are you using Helm to install PGO as well as to create the PostgresCluster?

In this case, which was specifically upgraded to v5.0.3? In other words, did you run helm upgrade for both the PGO install itself, as well as for the PostgresCluster?

andrewlecuyer avatar Nov 03 '21 19:11 andrewlecuyer

@andrewlecuyer

I did it for both. First PGO, then Cluster

ZuSe avatar Nov 03 '21 19:11 ZuSe

I think that is enough to just delete replica and pgbackrest-host StatefulSets and the operator will recreate them correctly. We had the same issue from 5.0.2 to 5.0.4 and just deleting pgbackrest-host StatefulSets triggered the operator to update the postgres instance statefulsets

cr1cr1 avatar Feb 18 '22 15:02 cr1cr1

Hello, I'm looking into this and I think I may have an answer to at least part of this.

I installed pgo 5.0.2 through Helm, and I created a cluster -- but I noticed that the hippo-ssh-config configMap was not present. I looked through some old docs and examples and added a field:

spec:
  backups:
    pgBackRest:
      repoHost:
        dedicated: {}

And that kicked off the creation and mounting of that cm & secret. If I delete that field and upgrade pgo to 5.0.3, it's fine, so the problem is with that repoHost area. Or at least, that's what I first thought.

But then I checked the operator logs and it was complaining:

time="2022-10-17T21:02:12Z" level=error msg="reconciling repository host" 
error="StatefulSet.apps \"cluster-repo-host\" is invalid: spec: Forbidden: 
updates to statefulset spec for fields other than 'replicas', 'template', 
'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' 
are forbidden" ...

OK, so: there's no repo-host statefulset created in 5.0.2 unless the backups.pgbackrest.repoHost object is filled in. And once there is that statefulset, then updates out of 5.0.2 are going to run into an error because >5.0.2, statefulset has a topology spread constraint, which can't be updated.

As @cr1cr1 pointed out, a solution here is to delete the sts that can't be updated, which will unblock the operator, which will then create the missing cm and secret (and also update the <clustername>-pgbackrest-config configmap into the right form).

How do I feel about that solution? Well, that's actually our recommended solution in the docs: https://access.crunchydata.com/documentation/postgres-operator/v5/upgrade/kustomize/#upgrading-from-pgo-v5-0-2-and-below

benjaminjb avatar Oct 17 '22 21:10 benjaminjb