gitpod
gitpod copied to clipboard
Epic: Ensure durability for user workspace files
Summary
Better protect user data
Context
Sometimes a workspace, node, or workspace cluster fail and the user data cannot be backed up to cloud storage, resulting in data loss. A related incident for a global outage. A related RFC where we are discussing solutions.
Value
By better handling user data, users will trust that even if the Gitpod service is unavailable, once it is online, they will not lose data.
Acceptance criteria
User data is persisted in such a way that even if there is a workspace, node, or cluster failure, the data is accessible to be backed up at a later time.
Tasks
Ops:
- Workspace Preview (should be done first)
- [x] automate GCP service account creation for CSI driver to function
- [x] automate deployment of GCP CSI driver into cluster as part of cluster creation operation
automate deployment of GCP storageClasses as part of cluster creation operation ~~(specify
discard
mount option)~~ - [x] automate deployment of snapshotter CRD and controller deployment as part of cluster creation operation, validate snapshotter is snapshots and we can create PVC from snapshot
- [x] gitpod-io/workspace-preview#58
- Preview Environment
- [x] gitpod-io/gitpod#10017
- [x] gitpod-io/gitpod#10197
- [x] gitpod-io/gitpod#10201
- Jobs for
workspace-clusters
- [x] gitpod-io/ops#4436
- [x] gitpod-io/gitpod#12327
Design:
- [x] List impacted components and visualize flows in the RFC
- [x] Double check the estimate for cost impacts
- [x] Compare with other DD, to be consistent / fill in gaps
- [x] gitpod-io/gitpod#9054
Product changes:
- Functionality:
- [x] gitpod-io/gitpod#9117
- [x] gitpod-io/gitpod#9142
- [x] gitpod-io/gitpod#9442
- [x] gitpod-io/gitpod#9469
- [x] gitpod-io/gitpod#9984
- [x] gitpod-io/gitpod#10259
- [x] gitpod-io/gitpod#10531
- [x] gitpod-io/gitpod#11336
- [x] gitpod-io/gitpod#10886
- [x] gitpod-io/gitpod#10210
- [x] gitpod-io/gitpod#10186
- [x] gitpod-io/gitpod#10612
- [x] gitpod-io/gitpod#10260
- [x] gitpod-io/gitpod#12745
- [ ] Switch select customer teams to using PVC that would benefit from this and are willing to help test
- [x] #13930
- [x] #14364
- [ ] Switch everyone to using PVC
- [x] gitpod-io/gitpod#10334
- [x] gitpod-io/gitpod#10887
- [x] gitpod-io/gitpod#11635
- [x] gitpod-io/gitpod#11786
- [x] gitpod-io/gitpod#11769
- [x] gitpod-io/gitpod#11770
- [x] gitpod-io/gitpod#12420
- [x] gitpod-io/gitpod#12463
- [x] gitpod-io/gitpod#12718
- [x] gitpod-io/gitpod#12666
- [x] gitpod-io/gitpod#12494
- [x] gitpod-io/gitpod#12464
- [x] gitpod-io/gitpod#12507
- [x] gitpod-io/gitpod#13007
- [x] gitpod-io/gitpod#13280
- [x] gitpod-io/gitpod#13282
- [x] gitpod-io/gitpod#13353
- [x] gitpod-io/gitpod#13531
- [x] gitpod-io/gitpod#13856
- [x] gitpod-io/gitpod#13980
- [x] gitpod-io/gitpod#14003
- [x] gitpod-io/gitpod#14159
- Observability:
- [x] gitpod-io/gitpod#9353
- [x] gitpod-io/gitpod#10195
- [x] gitpod-io/gitpod#11722
- Installer/KOTS
- [x] gitpod-io/gitpod#10613 (Moves to gitpod-io/gitpod#11476)
- [x] ~#10614~ (Moves to gitpod-io/gitpod#11476)
Tests:
- [x] gitpod-io/gitpod#10162
- [x] gitpod-io/gitpod#9990
- [x] gitpod-io/gitpod#12497
- [x] gitpod-io/gitpod#12638
- [x] gitpod-io/gitpod#12560
- [x] gitpod-io/gitpod#12744
- [x] gitpod-io/gitpod#12747
- [x] gitpod-io/gitpod#13146
- [x] ~#10211~ (Moves to gitpod-io/gitpod#11476)
- [x] ~#10212~ (Moves to gitpod-io/gitpod#11476)
- [x] https://github.com/gitpod-io/gitpod/issues/13591
- [x] https://github.com/gitpod-io/ops/issues/6270
Bug
- [x] https://github.com/gitpod-io/gitpod/issues/14266
Should solve:
- [x] https://github.com/gitpod-io/gitpod/issues/7311
- [x] https://github.com/gitpod-io/gitpod/issues/8198
Day 2:
- [x] https://github.com/gitpod-io/gitpod/issues/12892
- [x] https://github.com/gitpod-io/gitpod/issues/14451
- [x] https://github.com/gitpod-io/gitpod/issues/13856
- [x] https://github.com/gitpod-io/gitpod/issues/9496
~~@kylos101 Few questions related to "users must be able to access their most recent backup for a workspace regardless of workspace status":~~
~~1. During the stopping
state, would the system be able to distinguish a backup that was done as a result of it from a previous one?~~
~~2. From what I understand/recall we store the last 4 backups. Would we be able to provide the WebApp with the links and corresponding timestamps of all of them?~~
automate deployment of GCP storageClasses as part of cluster creation operation (specify discard mount option)
this is not required for XFS.
installer: allow to specify storageClass in gitpod.yaml
this can be optional for the first iteration
@sagor999 as a heads up, I added a few observability tasks. One of the first ones we'll need (if it doesn't already exist) is the ability to inspect backups and restores now being done with TAR. For example, this way we can measure duration for both.
@sagor999 @jenting are there any more integration tests that need to be added for new code we've written? In other words, I see you've fixed existing tests, but wanted to double check for new test needs. For example, one test I can think of, would be a test that kills a pod, relies on a process to backup the orphaned PVC, and then assert that the PVC is gone (because it was snapshotted).
Currently has this issue affecting PVC epic: Sometimes workspace attempts to start with PVC feature enabled
Question: How would someone who ran out of hours get their data back? (re: https://github.com/gitpod-io/gitpod/pull/14393) Contact support? It'd be better if they could self-serve.
Question: Will this change prevent us to download a single file in the workspace? (Will the "Download..." button in the right-click menu of a file still available?) Sometimes, I need to update my artifact on another server by downloading the artifact from Gitpod server and upload it to my server manually.
Question: Will this change prevent us to download a single file in the workspace? (Will the "Download..." button in the right-click menu of a file still available?)
No, this is about downloading the workspace content backup. You can still download individual files from your running workspace depending on how you connect to it. E.g. with Vs Code, just drag and drop.
Maybe this issue should be part of this epic to not lose my workspace's content on a regular basis: https://github.com/gitpod-io/gitpod/issues/11183
Update: Blocker functional issues, and significantly increased workspace startup times were found on the current technical design. 😞
After internal discussions, given backup success ratio is high and stable following adjacent improvements, and that the implementation of the new design will be considerably faster to do after https://github.com/gitpod-io/gitpod/issues/11416, we have decided to pause this effort until then.
PS: @6uliver I believe the root cause of that issue is different from the context of this one. I will follow-up on that one there. 🙏
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.