tljh-repo2docker
tljh-repo2docker copied to clipboard
Persistent user storage accross containers and container-recreates
Right now, a users /home/jovyan is stored inside their respective container. What that means is that if a user wants to update their base image, or use an entirely different one, they're going to lose all their data.
The most obvious solution for this to me is: For every user, create a "storage-$USERNAME" container from a configured, very basic, nearly empty image, that has /home/jovyan as a volume. Then create every users actual containers with "--volumes-from storage-$USERNAME".
Thanks @TimoRoth.
Yes the tljh-repo2docker
plugin doesn't give any storage to the users, and sessions are ephemeral.
Other plugins can enable storage, which is for example the case of the plasma plugin:
https://github.com/plasmabio/plasma/blob/7883a6a1266b69ab49f353bdf9974be408bf0709/tljh-plasma/tljh_plasma/init.py#L70-L76
This could also be done by adding a jupyterhub_config.py
snippet to TLJH.
Do you think there would be value in having a default storage as part of the tljh-repo2docker
plugin?
I have come up with this in my custom config to achieve pretty much what I wanted by now:
c.DockerSpawner.mounts.append({
'source': 'jupyter-storage-{username}',
'target': '/home/jovyan',
'type': 'volume',
'driver_config': {
'Options': {
'size': '16G'
}
}
})
c.DockerSpawner.extra_host_config.update({
'storage_opt': {
'size': '16G'
}
})
The mounts option depend on a PR to dockerspawner: https://github.com/jupyterhub/dockerspawner/pull/373 The volume size option needs a PR to docker: https://github.com/moby/moby/pull/41330
Thanks @TimoRoth for sharing your solution :+1:
Do you think we should keep https://github.com/plasmabio/tljh-repo2docker/pull/36 open?
Since this seems like a case a lot of people would want, it might be worth it to document that option somewhere, so someone else does not need to go on a long search for it like I did.
For a more simple setup without size quota, the existing volume code is enough, and all one needs to do is to use a named volume with the {username}
in it and mount it at /home or /home/jovyan.
But generally it's possible already, so the issue can probably be closed.
Edit: Just noticed that's my other PR, not this issue. No, that PR is entirely unneeded. I found out after that PR that you can use templates in volumes and mounts, making everything the PR does unnecessary.
Thanks @TimoRoth.
Since this seems like a case a lot of people would want, it might be worth it to document that option somewhere, so someone else does not need to go on a long search for it like I did.
Would you like to open a PR to add that to the README?
I used tljh-repo2docker for a course (mainly on git) and first I'd like to say that this plugin for TLJH is terrific! It makes it so easy to set up TLJH with multiple environments and in particular it's the simplest way I found to provide access to RStudio for multiple people. So thanks for your work!
Now to my point on this issue: my course is in two parts over two weeks and I was surprised to see that the changes made last week were ephemeral, hence my tumbling across this issue. In this case it's not a problem and ok to start from scratch, but it might be an issue for others and I guess it should be stated clearly in the README. Also I'd appreciate some explanations in the README on how to provide permanent storage. I don't think it makes sense to make it default because it can also be useful to start with a clean slate every time, but having directions might help! Unfortunately I don't have the required competence to add such information, but maybe by reviving this issue, one of you will provide an update. I could just try some of the info provided above but it would be nice to understand a bit more. Thanks!
I have come up with this in my custom config to achieve pretty much what I wanted by now:
c.DockerSpawner.mounts.append({ 'source': 'jupyter-storage-{username}', 'target': '/home/jovyan', 'type': 'volume', 'driver_config': { 'Options': { 'size': '16G' } } }) c.DockerSpawner.extra_host_config.update({ 'storage_opt': { 'size': '16G' } })
The mounts option depend on a PR to dockerspawner: jupyterhub/dockerspawner#373 The volume size option needs a PR to docker: moby/moby#41330
can I know where is jupyter-storage-{username}
in the machine ?
It's a docker volume. So normally, docker will store it somewhere in its data dir, which is normally at /var/lib/docker. But you're not supposed to access it directly like that, and doing so easily breaks things.