gitpod icon indicating copy to clipboard operation
gitpod copied to clipboard

Epic: Ensure durability for user workspace files

Open kylos101 opened this issue 3 years ago • 13 comments

Summary

Better protect user data

Context

Sometimes a workspace, node, or workspace cluster fail and the user data cannot be backed up to cloud storage, resulting in data loss. A related incident for a global outage. A related RFC where we are discussing solutions.

Value

By better handling user data, users will trust that even if the Gitpod service is unavailable, once it is online, they will not lose data.

Acceptance criteria

User data is persisted in such a way that even if there is a workspace, node, or cluster failure, the data is accessible to be backed up at a later time.

Tasks

Ops:

  • Workspace Preview (should be done first)
    • [x] automate GCP service account creation for CSI driver to function
    • [x] automate deployment of GCP CSI driver into cluster as part of cluster creation operation automate deployment of GCP storageClasses as part of cluster creation operation ~~(specify discard mount option)~~
    • [x] automate deployment of snapshotter CRD and controller deployment as part of cluster creation operation, validate snapshotter is snapshots and we can create PVC from snapshot
    • [x] gitpod-io/workspace-preview#58
  • Preview Environment
    • [x] gitpod-io/gitpod#10017
    • [x] gitpod-io/gitpod#10197
    • [x] gitpod-io/gitpod#10201
  • Jobs for workspace-clusters
    • [x] gitpod-io/ops#4436
    • [x] gitpod-io/gitpod#12327

Design:

  • [x] List impacted components and visualize flows in the RFC
  • [x] Double check the estimate for cost impacts
  • [x] Compare with other DD, to be consistent / fill in gaps
  • [x] gitpod-io/gitpod#9054

Product changes:

  • Functionality:
    • [x] gitpod-io/gitpod#9117
    • [x] gitpod-io/gitpod#9142
    • [x] gitpod-io/gitpod#9442
    • [x] gitpod-io/gitpod#9469
    • [x] gitpod-io/gitpod#9984
    • [x] gitpod-io/gitpod#10259
    • [x] gitpod-io/gitpod#10531
    • [x] gitpod-io/gitpod#11336
    • [x] gitpod-io/gitpod#10886
    • [x] gitpod-io/gitpod#10210
    • [x] gitpod-io/gitpod#10186
    • [x] gitpod-io/gitpod#10612
    • [x] gitpod-io/gitpod#10260
    • [x] gitpod-io/gitpod#12745
    • [ ] Switch select customer teams to using PVC that would benefit from this and are willing to help test
    • [x] #13930
    • [x] #14364
    • [ ] Switch everyone to using PVC
    • [x] gitpod-io/gitpod#10334
    • [x] gitpod-io/gitpod#10887
    • [x] gitpod-io/gitpod#11635
    • [x] gitpod-io/gitpod#11786
    • [x] gitpod-io/gitpod#11769
    • [x] gitpod-io/gitpod#11770
    • [x] gitpod-io/gitpod#12420
    • [x] gitpod-io/gitpod#12463
    • [x] gitpod-io/gitpod#12718
    • [x] gitpod-io/gitpod#12666
    • [x] gitpod-io/gitpod#12494
    • [x] gitpod-io/gitpod#12464
    • [x] gitpod-io/gitpod#12507
    • [x] gitpod-io/gitpod#13007
    • [x] gitpod-io/gitpod#13280
    • [x] gitpod-io/gitpod#13282
    • [x] gitpod-io/gitpod#13353
    • [x] gitpod-io/gitpod#13531
    • [x] gitpod-io/gitpod#13856
    • [x] gitpod-io/gitpod#13980
    • [x] gitpod-io/gitpod#14003
    • [x] gitpod-io/gitpod#14159
  • Observability:
    • [x] gitpod-io/gitpod#9353
    • [x] gitpod-io/gitpod#10195
    • [x] gitpod-io/gitpod#11722
  • Installer/KOTS
    • [x] gitpod-io/gitpod#10613 (Moves to gitpod-io/gitpod#11476)
    • [x] ~#10614~ (Moves to gitpod-io/gitpod#11476)

Tests:

  • [x] gitpod-io/gitpod#10162
  • [x] gitpod-io/gitpod#9990
  • [x] gitpod-io/gitpod#12497
  • [x] gitpod-io/gitpod#12638
  • [x] gitpod-io/gitpod#12560
  • [x] gitpod-io/gitpod#12744
  • [x] gitpod-io/gitpod#12747
  • [x] gitpod-io/gitpod#13146
  • [x] ~#10211~ (Moves to gitpod-io/gitpod#11476)
  • [x] ~#10212~ (Moves to gitpod-io/gitpod#11476)
  • [x] https://github.com/gitpod-io/gitpod/issues/13591
  • [x] https://github.com/gitpod-io/ops/issues/6270

Bug

  • [x] https://github.com/gitpod-io/gitpod/issues/14266

Should solve:

  • [x] https://github.com/gitpod-io/gitpod/issues/7311
  • [x] https://github.com/gitpod-io/gitpod/issues/8198

Day 2:

  • [x] https://github.com/gitpod-io/gitpod/issues/12892
  • [x] https://github.com/gitpod-io/gitpod/issues/14451
  • [x] https://github.com/gitpod-io/gitpod/issues/13856
  • [x] https://github.com/gitpod-io/gitpod/issues/9496

Front logo Front conversations

kylos101 avatar Jan 28 '22 18:01 kylos101

~~@kylos101 Few questions related to "users must be able to access their most recent backup for a workspace regardless of workspace status":~~ ~~1. During the stopping state, would the system be able to distinguish a backup that was done as a result of it from a previous one?~~ ~~2. From what I understand/recall we store the last 4 backups. Would we be able to provide the WebApp with the links and corresponding timestamps of all of them?~~

atduarte avatar Jan 31 '22 18:01 atduarte

automate deployment of GCP storageClasses as part of cluster creation operation (specify discard mount option)

this is not required for XFS.

aledbf avatar Mar 28 '22 13:03 aledbf

installer: allow to specify storageClass in gitpod.yaml

this can be optional for the first iteration

aledbf avatar Mar 28 '22 13:03 aledbf

@sagor999 as a heads up, I added a few observability tasks. One of the first ones we'll need (if it doesn't already exist) is the ability to inspect backups and restores now being done with TAR. For example, this way we can measure duration for both.

kylos101 avatar Apr 06 '22 15:04 kylos101

@sagor999 @jenting are there any more integration tests that need to be added for new code we've written? In other words, I see you've fixed existing tests, but wanted to double check for new test needs. For example, one test I can think of, would be a test that kills a pod, relies on a process to backup the orphaned PVC, and then assert that the PVC is gone (because it was snapshotted).

kylos101 avatar Jun 01 '22 18:06 kylos101

Currently has this issue affecting PVC epic: Sometimes workspace attempts to start with PVC feature enabled

sagor999 avatar Aug 08 '22 18:08 sagor999

Question: How would someone who ran out of hours get their data back? (re: https://github.com/gitpod-io/gitpod/pull/14393) Contact support? It'd be better if they could self-serve.

axonasif avatar Nov 11 '22 15:11 axonasif

Question: Will this change prevent us to download a single file in the workspace? (Will the "Download..." button in the right-click menu of a file still available?) Sometimes, I need to update my artifact on another server by downloading the artifact from Gitpod server and upload it to my server manually.

SNWCreations avatar Nov 12 '22 04:11 SNWCreations

Question: Will this change prevent us to download a single file in the workspace? (Will the "Download..." button in the right-click menu of a file still available?)

No, this is about downloading the workspace content backup. You can still download individual files from your running workspace depending on how you connect to it. E.g. with Vs Code, just drag and drop.

svenefftinge avatar Nov 14 '22 14:11 svenefftinge

Maybe this issue should be part of this epic to not lose my workspace's content on a regular basis: https://github.com/gitpod-io/gitpod/issues/11183

6uliver avatar Nov 18 '22 14:11 6uliver

Update: Blocker functional issues, and significantly increased workspace startup times were found on the current technical design. 😞

After internal discussions, given backup success ratio is high and stable following adjacent improvements, and that the implementation of the new design will be considerably faster to do after https://github.com/gitpod-io/gitpod/issues/11416, we have decided to pause this effort until then.

PS: @6uliver I believe the root cause of that issue is different from the context of this one. I will follow-up on that one there. 🙏

atduarte avatar Nov 24 '22 09:11 atduarte

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 16 '23 21:09 stale[bot]