etcd-cluster-operator icon indicating copy to clipboard operation
etcd-cluster-operator copied to clipboard

Backup/Restore proxy and agents

Open JamesLaverack opened this issue 5 years ago • 0 comments

Implement 'proxy' service to control upload and download of backups during the backup and restore process.

See the design document for details (TODO commit design document into git repo 😅)

  • [ ] Implement a proxy service.
    • [x] Add the entry point under cmd/proxy, the Dockerfile at build/package/proxy.Dockerfile, and basic infrastructure changes to enable the proxy to be built and published in tests.
    • [x] Add the gRPC APIs for backup and restore, at first as “stub” implementations.
    • [ ] Implement the backup upload code.
    • [x] Implement the restore download code.
    • [x] Implement credential handling.
    • [x] Add recommended deployment YAML, and instructions to the installation documentation.
    • [ ] Implement a metrics endpoint for the proxy service, and expose something useful.
  • [x] Rebuild restore branch on top of proxy work.
    • [x] Change restore agent implementation to use proxy, and republish pull request.
    • [x] Build restore agent Docker image.
    • [x] Change EtcdRestore to only specify an object URL and not credentials.
    • [ ] Make the operator create a ServiceAccount in the client’s Namespace to run the restore Pod with.
    • [x] Update restore documentation and examples.
  • [ ] Build backup agent
    • [ ] Add entry point under cmd/backup-agent, add it to the Dockerfile, and other build infra changes to build it in tests.
    • [ ] Implement backup call in agent, calling out to proxy’s API for upload.
    • [ ] Change EtcdBackup to not specify a destination or credentials.
    • [ ] Remove old backup code, and change the controller for EtcdBackup to launch the agent instead.
    • [ ] Make the operator create a ServiceAccount in the client’s namespace to run the backup Pod with.
    • [ ] Update backup documentation and examples.
  • [ ] End-to-end testing
    • [x] Deploy MinIO in kind as part of the testing context.
    • [ ] Write a full end-to-end test that:
      1. Deploys an EtcdCluster
      2. Writes a key in that etcd cluster to value 1
      3. Takes a backup to MinIO using the S3 API
      4. Changes the key to value 2
      5. Delete the EtcdCluster and PersistentVolumeClaims
      6. Create a an EtcdRestore
      7. Wait for the etcd cluster to come back
      8. Verify that the contents of the key is value 1
  • [ ] Miscellaneous Cleanup
    • [ ] Commit design document into the repository as Markdown
    • [ ] Update documentation to match approach.

JamesLaverack avatar Feb 03 '20 14:02 JamesLaverack