Linus Arver

Results 57 comments of Linus Arver

Short of reaching consensus on the initial backup approach, let's try to identify some invariants. (1) job duration < 24 hrs: I think we want the backups to happen at...

Looks like there is already a _GCS_ disaster recovery script underway here: https://github.com/kubernetes/k8s.io/pull/334. We should probably follow the same infrastructural patterns established there.

> The pattern that I'm proposing in #334 is a different script for copying everything, with a no-overwrite / no-delete policy (I implemented that in code, @thockin pointed out that...

An additional thought: I think it makes sense for the backup GCR to additionally mirror the latest snapshot of the prod GCR. This way, we could just redirect the vanity...

Are there any thoughts about using the promoter directly for performing backups? We should be able to do this once #118 is merged. The backup process would be: 1. Construct...

I am working on a doc to sum everything up + an initial implementation. Will share with this thread soon... stay tuned!

Here is a writeup of an initial approach/design: https://docs.google.com/document/d/1od5y-Z2xP9mVmg2Yztnv-GQ7D-orj9HsTmeVvNHkzzA/edit?usp=sharing Mailing list link: https://groups.google.com/d/msg/kubernetes-wg-k8s-infra/cseCwgALwdk/iOYkaEYFCAAJ You must be a member of the kubernetes-wg-k8s-infra Google group in order to access the document.

Yup this should stay open. There needs to be more docs around how to handle the scenario of (omg the prod GCR is messed up, how do I recover?). /assign...

> How do you envision # 2 working? A full-fledged curses selector? A search thingie with completion or something like that? A curses-based UI would be a good start. Maybe...