cloudnative-pg icon indicating copy to clipboard operation
cloudnative-pg copied to clipboard

[Bug]: No Target Backup Found with backup available

Open PrivatePuffin opened this issue 1 year ago • 5 comments

Is there an existing issue already for this bug?

  • [X] I have searched for an existing issue, and could not find anything. I believe this is a new bug.

I have read the troubleshooting guide

  • [X] I have read the troubleshooting guide and I think this is a new bug.

I am running a supported version of CloudNativePG

  • [X] I have read the troubleshooting guide and I think this is a new bug.

Contact Details

[email protected]

Version

1.22.1

What version of Kubernetes are you using?

1.28

What is your Kubernetes environment?

Self-managed: kind (evaluation)

How did you install the operator?

Helm

What happened?

Having made a base-backup to backblaze B2, which is visible from the B2 webinterface in the designated bucket. The restore fails stating "no target backup found", yet the backup is there.

Also trying to set serverName to no avail. The error is non-descriptive as-fuck, doesn't give any reasonable output.

Cluster resource

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  annotations:
    cnpg.io/hibernation: 'off'
    meta.helm.sh/release-name: authentik
    meta.helm.sh/release-namespace: authentik
    rollme: PkRth
  creationTimestamp: '2024-03-06T19:52:31Z'
  generation: 1
  labels:
    app: authentik-24.2.5
    app.kubernetes.io/instance: authentik
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: authentik
    app.kubernetes.io/version: 2024.2.1
    cnpg.io/reload: 'on'
    helm-revision: '1'
    helm.sh/chart: authentik-24.2.5
    helm.toolkit.fluxcd.io/name: authentik
    helm.toolkit.fluxcd.io/namespace: authentik
    release: authentik
  managedFields:
    - apiVersion: postgresql.cnpg.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:cnpg.io/hibernation: {}
            f:meta.helm.sh/release-name: {}
            f:meta.helm.sh/release-namespace: {}
            f:rollme: {}
          f:labels:
            .: {}
            f:app: {}
            f:app.kubernetes.io/instance: {}
            f:app.kubernetes.io/managed-by: {}
            f:app.kubernetes.io/name: {}
            f:app.kubernetes.io/version: {}
            f:cnpg.io/reload: {}
            f:helm-revision: {}
            f:helm.sh/chart: {}
            f:helm.toolkit.fluxcd.io/name: {}
            f:helm.toolkit.fluxcd.io/namespace: {}
            f:release: {}
        f:spec:
          .: {}
          f:bootstrap:
            .: {}
            f:recovery:
              .: {}
              f:database: {}
              f:owner: {}
              f:secret:
                .: {}
                f:name: {}
              f:source: {}
          f:enableSuperuserAccess: {}
          f:externalClusters: {}
          f:failoverDelay: {}
          f:instances: {}
          f:logLevel: {}
          f:maxSyncReplicas: {}
          f:minSyncReplicas: {}
          f:monitoring:
            .: {}
            f:disableDefaultQueries: {}
            f:enablePodMonitor: {}
          f:nodeMaintenanceWindow:
            .: {}
            f:inProgress: {}
            f:reusePVC: {}
          f:postgresGID: {}
          f:postgresUID: {}
          f:primaryUpdateMethod: {}
          f:primaryUpdateStrategy: {}
          f:replicationSlots:
            .: {}
            f:highAvailability:
              .: {}
              f:enabled: {}
              f:slotPrefix: {}
            f:updateInterval: {}
          f:resources:
            .: {}
            f:limits:
              .: {}
              f:cpu: {}
              f:memory: {}
            f:requests:
              .: {}
              f:cpu: {}
              f:memory: {}
          f:smartShutdownTimeout: {}
          f:startDelay: {}
          f:stopDelay: {}
          f:storage:
            .: {}
            f:pvcTemplate:
              .: {}
              f:accessModes: {}
              f:resources:
                .: {}
                f:requests:
                  .: {}
                  f:storage: {}
              f:storageClassName: {}
            f:resizeInUseVolumes: {}
          f:switchoverDelay: {}
          f:walStorage:
            .: {}
            f:pvcTemplate:
              .: {}
              f:accessModes: {}
              f:resources:
                .: {}
                f:requests:
                  .: {}
                  f:storage: {}
              f:storageClassName: {}
            f:resizeInUseVolumes: {}
      manager: helm-controller
      operation: Update
      time: '2024-03-06T19:52:31Z'
    - apiVersion: postgresql.cnpg.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:certificates:
            .: {}
            f:clientCASecret: {}
            f:expirations:
              .: {}
              f:authentik-cnpg-main-ca: {}
              f:authentik-cnpg-main-replication: {}
              f:authentik-cnpg-main-server: {}
            f:replicationTLSSecret: {}
            f:serverAltDNSNames: {}
            f:serverCASecret: {}
            f:serverTLSSecret: {}
          f:cloudNativePGCommitHash: {}
          f:cloudNativePGOperatorHash: {}
          f:conditions: {}
          f:configMapResourceVersion:
            .: {}
            f:metrics:
              .: {}
              f:cnpg-default-monitoring: {}
          f:initializingPVC: {}
          f:instanceNames: {}
          f:instances: {}
          f:jobCount: {}
          f:latestGeneratedNode: {}
          f:managedRolesStatus: {}
          f:phase: {}
          f:phaseReason: {}
          f:poolerIntegrations:
            .: {}
            f:pgBouncerIntegration:
              .: {}
              f:secrets: {}
          f:pvcCount: {}
          f:readService: {}
          f:secretsResourceVersion:
            .: {}
            f:applicationSecretVersion: {}
            f:clientCaSecretVersion: {}
            f:replicationSecretVersion: {}
            f:serverCaSecretVersion: {}
            f:serverSecretVersion: {}
            f:superuserSecretVersion: {}
          f:targetPrimary: {}
          f:targetPrimaryTimestamp: {}
          f:topology:
            .: {}
            f:successfullyExtracted: {}
          f:writeService: {}
      manager: manager
      operation: Update
      subresource: status
      time: '2024-03-06T19:52:37Z'
  name: authentik-cnpg-main
  namespace: authentik
  resourceVersion: '57200282'
  uid: 983c6b00-9052-457d-83c4-7e78b6400dfe
  selfLink: >-
    /apis/postgresql.cnpg.io/v1/namespaces/authentik/clusters/authentik-cnpg-main
status:
  certificates:
    clientCASecret: authentik-cnpg-main-ca
    expirations:
      authentik-cnpg-main-ca: 2024-06-04 19:47:33 +0000 UTC
      authentik-cnpg-main-replication: 2024-06-04 19:47:33 +0000 UTC
      authentik-cnpg-main-server: 2024-06-04 19:47:33 +0000 UTC
    replicationTLSSecret: authentik-cnpg-main-replication
    serverAltDNSNames:
      - authentik-cnpg-main-rw
      - authentik-cnpg-main-rw.authentik
      - authentik-cnpg-main-rw.authentik.svc
      - authentik-cnpg-main-r
      - authentik-cnpg-main-r.authentik
      - authentik-cnpg-main-r.authentik.svc
      - authentik-cnpg-main-ro
      - authentik-cnpg-main-ro.authentik
      - authentik-cnpg-main-ro.authentik.svc
    serverCASecret: authentik-cnpg-main-ca
    serverTLSSecret: authentik-cnpg-main-server
  cloudNativePGCommitHash: c7be872e
  cloudNativePGOperatorHash: 262d86af058f59462fdccaec51231a1afd888153413f63224f60e458fcd335be
  conditions:
    - lastTransitionTime: '2024-03-06T19:52:34Z'
      message: Cluster Is Not Ready
      reason: ClusterIsNotReady
      status: 'False'
      type: Ready
  configMapResourceVersion:
    metrics:
      cnpg-default-monitoring: '57200127'
  initializingPVC:
    - authentik-cnpg-main-1
    - authentik-cnpg-main-1-wal
  instanceNames:
    - authentik-cnpg-main-1
  instances: 1
  jobCount: 1
  latestGeneratedNode: 1
  managedRolesStatus: {}
  phase: Setting up primary
  phaseReason: Creating primary instance authentik-cnpg-main-1
  poolerIntegrations:
    pgBouncerIntegration:
      secrets:
        - authentik-cnpg-main-pooler
  pvcCount: 2
  readService: authentik-cnpg-main-r
  secretsResourceVersion:
    applicationSecretVersion: '57199801'
    clientCaSecretVersion: '57200026'
    replicationSecretVersion: '57200031'
    serverCaSecretVersion: '57200026'
    serverSecretVersion: '57200028'
    superuserSecretVersion: '57200038'
  targetPrimary: authentik-cnpg-main-1
  targetPrimaryTimestamp: '2024-03-06T19:52:34.615107Z'
  topology:
    successfullyExtracted: true
  writeService: authentik-cnpg-main-rw
spec:
  affinity:
    podAntiAffinityType: preferred
  bootstrap:
    recovery:
      database: authentik
      owner: authentik
      secret:
        name: authentik-cnpg-main-user
      source: objectStoreRecoveryCluster
  enableSuperuserAccess: true
  externalClusters:
    - barmanObjectStore:
        destinationPath: s3://kjeldschouten-cnpg-authentik/
        endpointURL: https://s3.eu-central-003.backblazeb2.com
        s3Credentials:
          accessKeyId:
            key: ACCESS_KEY_ID
            name: authentik-cnpg-main-provider-recovery-s3-creds
          secretAccessKey:
            key: ACCESS_SECRET_KEY
            name: authentik-cnpg-main-provider-recovery-s3-creds
      name: objectStoreRecoveryCluster
  failoverDelay: 0
  imageName: ghcr.io/cloudnative-pg/postgresql:16.1
  instances: 2
  logLevel: info
  maxSyncReplicas: 0
  minSyncReplicas: 0
  monitoring:
    customQueriesConfigMap:
      - key: queries
        name: cnpg-default-monitoring
    disableDefaultQueries: false
    enablePodMonitor: true
  nodeMaintenanceWindow:
    inProgress: false
    reusePVC: true
  postgresGID: 26
  postgresUID: 26
  postgresql:
    parameters:
      archive_mode: 'on'
      archive_timeout: 5min
      dynamic_shared_memory_type: posix
      log_destination: csvlog
      log_directory: /controller/log
      log_filename: postgres
      log_rotation_age: '0'
      log_rotation_size: '0'
      log_truncate_on_rotation: 'false'
      logging_collector: 'on'
      max_parallel_workers: '32'
      max_replication_slots: '32'
      max_worker_processes: '32'
      shared_memory_type: mmap
      shared_preload_libraries: ''
      ssl_max_protocol_version: TLSv1.3
      ssl_min_protocol_version: TLSv1.3
      wal_keep_size: 512MB
      wal_receiver_timeout: 5s
      wal_sender_timeout: 5s
    syncReplicaElectionConstraint:
      enabled: false
  primaryUpdateMethod: switchover
  primaryUpdateStrategy: unsupervised
  replicationSlots:
    highAvailability:
      enabled: true
      slotPrefix: _cnpg_
    updateInterval: 30
  resources:
    limits:
      cpu: '4'
      memory: 8Gi
    requests:
      cpu: 10m
      memory: 50Mi
  smartShutdownTimeout: 180
  startDelay: 3600
  stopDelay: 1800
  storage:
    pvcTemplate:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
      storageClassName: openebs-hostpath
    resizeInUseVolumes: true
  switchoverDelay: 3600
  walStorage:
    pvcTemplate:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
      storageClassName: openebs-hostpath
    resizeInUseVolumes: true

Relevant log output

{"level":"info","ts":"2024-03-06T19:52:44Z","msg":"Recovering from external cluster","logging_pod":"authentik-cnpg-main-1-full-recovery","sourceName":"objectStoreRecoveryCluster"}
{"level":"error","ts":"2024-03-06T19:52:44Z","msg":"Error while restoring a backup","logging_pod":"authentik-cnpg-main-1-full-recovery","error":"no target backup found","stacktrace":"github.com/cloudnative-pg/cloudnative-pg/pkg/management/log.(*logger).Error\n\tpkg/management/log/log.go:128\ngithub.com/cloudnative-pg/cloudnative-pg/pkg/management/log.Error\n\tpkg/management/log/log.go:166\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.restoreSubCommand\n\tinternal/cmd/manager/instance/restore/cmd.go:89\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.NewCmd.func2\n\tinternal/cmd/manager/instance/restore/cmd.go:60\ngithub.com/spf13/cobra.(*Command).execute\n\tpkg/mod/github.com/spf13/[email protected]/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tpkg/mod/github.com/spf13/[email protected]/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\tpkg/mod/github.com/spf13/[email protected]/command.go:1039\nmain.main\n\tcmd/manager/main.go:64\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.6/x64/src/runtime/proc.go:267"}

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

PrivatePuffin avatar Mar 06 '24 19:03 PrivatePuffin

In case people wonder, the backup is there: image

image

PrivatePuffin avatar Mar 06 '24 19:03 PrivatePuffin

and yes the credentials have access (same credentials as used for backup creation, read-write)

PrivatePuffin avatar Mar 06 '24 20:03 PrivatePuffin

Found the issue: The name of the externalClusters has to match the name of the folder the backup is under (in my case the old cluster name)

I don't think this is documented very well.

PrivatePuffin avatar Mar 06 '24 20:03 PrivatePuffin

Though besides this, the verbosity on backup and restore errors is completely abysmal. The amount of "exit 1" errors I got during research with no explaination is shocking

PrivatePuffin avatar Mar 06 '24 20:03 PrivatePuffin

Found the issue: The name of the externalClusters has to match the name of the folder the backup is under (in my case the old cluster name)

I don't think this is documented very well.

You saved my day, thanks!

winston0410 avatar Jun 21 '24 19:06 winston0410

@PrivatePuffin I'm hitting the same kind of issue. Would you mind sharing the config that worked for you? Thanks!

brouberol avatar Sep 12 '24 11:09 brouberol

@PrivatePuffin I'm hitting the same kind of issue. Would you mind sharing the config that worked for you? Thanks!

Explaination is above. We, as TrueCharts, have helm-templated this all away by now. So I dont have "vanilla" examples for this

PrivatePuffin avatar Sep 18 '24 12:09 PrivatePuffin

This issue is stale because it has been open for 60 days with no activity.

github-actions[bot] avatar Apr 02 '25 02:04 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Apr 16 '25 02:04 github-actions[bot]