epiphany icon indicating copy to clipboard operation
epiphany copied to clipboard

[BUG] Unable to perform HA cluster backup

Open atsikham opened this issue 4 years ago • 1 comments

Describe the bug HA cluster backup fails with following error

16:43:33 INFO cli.engine.ansible.AnsibleCommand - TASK [backup : Save etcd snapshot] ******************************************************************************************************
16:43:35 ERROR cli.engine.ansible.AnsibleCommand - fatal: [ec2-3-95-156-60.compute-1.amazonaws.com]: FAILED! => {"changed": true, "cmd": "docker run -v \"/epibackup/ansible.zKwY0W.tmp/:/backup/\" --network host --env ETCDCTL_API=3 --rm \"ec2-3-95-156-60.compute-1.amazonaws.com:5000/k8s.gcr.io/etcd:3.4.3-0 ec2-3-95-156-60.compute-1.amazonaws.com:5000/k8s.gcr.io/etcd:3.4.3-0 ec2-3-95-156-60.compute-1.amazonaws.com:5000/k8s.gcr.io/etcd:3.4.3-0\" etcdctl --endpoints https://127.0.0.1:2379 --cacert /backup/pki/etcd/ca.crt --cert /backup/pki/etcd/healthcheck-client.crt --key /backup/pki/etcd/healthcheck-client.key snapshot save /backup/etcd-snapshot.db\n", "delta": "0:00:00.062874", "end": "2020-09-16 16:43:37.864740", "msg": "non-zero return code", "rc": 125, "start": "2020-09-16 16:43:37.801866", "stderr": "docker: invalid reference format.\nSee 'docker run --help'.", "stderr_lines": ["docker: invalid reference format.", "See 'docker run --help'."], "stdout": "", "stdout_lines": []}

To Reproduce Steps to reproduce the behavior:

  1. Deploy HA k8s cluster
  2. Try to execute epicli backup -f backup.yml -b <build_dir>

Expected behavior Backup is completed successfully.

Config files backup.yml:

kind: configuration/backup
title: Backup Config
name: default
specification:
  components:
    load_balancer:
      enabled: false
    logging:
      enabled: false
    monitoring:
      enabled: true
    postgresql:
      enabled: false
    rabbitmq:
      enabled: true
    kubernetes:
      enabled: true

OS (please complete the following information):

  • OS: Ubuntu

Cloud Environment (please complete the following information):

  • Cloud Provider: AWS

Additional context No.


DoD checklist

  • Changelog
    • [ ] updated
    • [ ] not needed
  • COMPONENTS.md
    • [ ] updated
    • [ ] not needed
  • Schema
    • [ ] updated
    • [ ] not needed
  • Backport tasks
    • [ ] created
    • [ ] not needed
  • Documentation
    • [ ] added
    • [ ] updated
    • [ ] not needed
  • [ ] Feature has automated tests
  • [ ] Automated tests passed (QA pipelines)
    • [ ] apply
    • [ ] upgrade
    • [ ] backup/restore
  • [ ] Idempotency tested
  • [ ] All conversations in PR resolved

atsikham avatar Sep 16 '20 18:09 atsikham

This is low priority task. We don't support recovery, users have to do it themselves. :)

sk4zuzu avatar Sep 25 '20 08:09 sk4zuzu