zero-to-jupyterhub-k8s icon indicating copy to clipboard operation
zero-to-jupyterhub-k8s copied to clipboard

Backing up user data

Open TiemenSch opened this issue 3 years ago • 9 comments

Proposed change

Some recommended way or form of backing up and restoring user data in a Kubernetes-volume-friendly manner to a certain point in time.

It would be nice if there was some recommended form of backing up and restoring user volumes to for instance an Amazon S3 bucket or other form of off-site storage.

Since users are added and removed from the Hub over time, it makes sense to support this feature from the Hub and not have to select each (dynamically created) volume in your cloud provider's settings manually to turn on periodic backups.

My initial thoughts would be that it's best to leave the user-PVC-PV bookkeeping to the Hub, since that may become unwieldy for larger numbers of users very fast. Perhaps a Kubernetes Job to run these backups periodically and a similar one-time thing for restoring backups.

Alternative options

Currently, backing up user data is either up to the user themselves or to the sysadmins running ztjh (or any JupyterHub deployment).

Some cloud providers have a periodic backup option on their volumes, but they may require you selecting each volume and turning on the backups manually, which is quite the hassle.

Who would use this feature?

End-users and a form of peace of mind for sysadmins.

TiemenSch avatar Jan 20 '21 09:01 TiemenSch

Currently, the only "backup" mentioned in the docs is that you should backup your database before large upgrades. User storage backup is pretty much left for the admin to figure out. Which makes sense, since there is such a plethora of methods to do or achieve this depending on your storage solution.

However, I think that at least describing 'one recommended way to do it' may be enough to help people get started.

TiemenSch avatar Jan 20 '21 09:01 TiemenSch

I agree, storage backup and such is a topic that the guide could cover to some degree. While I think it's hard to establish a recommendation, I think we still can give some suggestions of what can be done.

consideRatio avatar Jan 20 '21 10:01 consideRatio

as a admin trying to figure out how to migrate jupyter hub from one cluster to another, i would very much appreciate some guidance on how to backup and restore user data without contacting each person and telling them i will delete everything and they had better save what they can now

i'mm thinking about some kind of thing that would somehow copy the files form each PV and then once they log in to the new one, slam the file back into place?

i have no idea what this is going to take. Do i copy the DB to the new instance somehow through a series of kubectl terminals into running containers? and then how do i ensure that the new PV are connected to the the user's sessions when they log in? or do i restore file into the user's PV after they log in and its created? there needs to do some guidance on what files are important to back up, what components should be shut down when they are copied out, or when they are restored.

agnewp avatar Feb 19 '21 16:02 agnewp

If you use for example GitHub OAauthenticator, then I don't think you need to save the hub disk. It just lists the users that has logged in once, but since who has access is defined by the OAuth2 application and such it doesn't matter much what is said in the current hub database about past users.

What matters is probably the user storage. The user storage assuming default configuration will be many separate PVs created through request from PVCs. Depending on your cloud provider you can do different things. On GKE you could do "disk snapshots" of the GCP disks mapped byt he PVs and then recreate them in another cluster and such. I don't know of any automated way to do something...

Hmm... in modern k8s versions, there are also VolumeSnapshots, but I don't think that will be very helpful...

Hmm... I think my best suggestion would be to setup NFS storage external to the k8s cluster and load all user storage into their own folder in the NFS storage, and then use NFS storage in the new cluster. Perhaps...

It is quite messy to migrate a large set of PVCs, and I would say it is a bit out of scope to be directly addressed by the z2jh.jupyter.org guide.

consideRatio avatar Feb 19 '21 16:02 consideRatio

forget the migrate case for now and focus on what someone could possibly do to even backup/restore even a single PVC. this large set of PVCs is the default configuration as you say, and it leaves very few options for a reasonable backup/restore method. how would i even approach doing a single one of these?

agnewp avatar Feb 19 '21 17:02 agnewp

@agnewp That's a really good question that would be well suited to our community forum where more people hang out. If you start a thread there you'll be able to help or get the input of many other K8s admins across multiple systems. There are loads of factors to take into account when backing up data, and there's unlikely to be a single good solution. Depending on the conclusions from that thread we could either copy some of the suggestions to this guide, or perhaps just link out to it.

manics avatar Feb 19 '21 17:02 manics

so is the answer that essentially if i copy all the file inside that volume, that is a 'backup' and if i paste them back, that is a restore? i don't need k8s specifics and its clear jupyter doesn't have a mechanism for doing this, however again what things should be disabled, is 100% of the application state on this user specific volume?

agnewp avatar Feb 19 '21 17:02 agnewp

@agnewp one thing to keep in mind while doing the backup is that the PVC name generated could change, so while backing up user data, you would also need to backup the PVC for users. On redeploying jupyterhub you may end up with new PV names in the PVC, so while copying back data to the new PVs you would need to match the old PVC with the new PVC.

We use NFS-provisioner for user storage on our cluster and we run a backup CRON job daily to back up user data and the PVC, if its helps you can have a look at https://github.com/gesiscss/orc/blob/master/storage/backup/docker/backup.py We have some internal documentation for this and I am planning on moving it to a public facing site, but it probably will be deployment-specific.

MridulS avatar Feb 19 '21 17:02 MridulS

A minor footnote to this section would be how to resize a user's PVC. Something along the lines of

kubectl get pvc | grep claim-
kubectl edit pvc/claim-user1

Edit the value under spec.resources.storage. The user's lab needs to be stopped and restarted for it to take effect.

mnp avatar May 11 '21 21:05 mnp