repo2docker icon indicating copy to clipboard operation
repo2docker copied to clipboard

Add support for postBuildAdmin

Open g-braeunlich opened this issue 4 years ago • 24 comments

Proposed change

In addition to postBuild script (userspace) also process postBuildAdmin scripts (system space).

Alternative options / Who would use this feature?

See discussions in #487 and https://gitter.im/jupyterhub/binder?at=5fce35b6fb7f155587ad481a

g-braeunlich avatar Jan 28 '21 14:01 g-braeunlich

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

welcome[bot] avatar Jan 28 '21 14:01 welcome[bot]

My initial reaction is that we shouldn't support this because it is not a frequently used feature and it adds another "magic" file.

repo2docker itself tries to support frequently used community practice workflows. And making it as effortless as possible for the user if they follow the community practices. This means we choose the trade-off in favour of the frequent and against the rare. For rare or expert use cases there is the escape hatch of providing your own Dockerfile.

The magic files we already have in repo2docker are one of the things we regret the most because they are totally custom. Which means they make things more complicated and are less well thought out. So we are trying to not add new ones unless there really, really is no other way.

This means: can we find more use cases where a user needs this feature and there is no alternative? For example the user in the linked gitter chat seemed happy to use a Dockerfile. Are there some specific cases in #487 that we could cover with this? Asking because in my mind "I want to change the base image" is often about "I don't want Ubuntu but X" which we can't solve with a postAdminBuild.

betatim avatar Jan 29 '21 09:01 betatim

This is the use-case that started this thread:

We want users (effectively jovian user), to be able to modify runtime environment; in particular, to run sudo apt-get update/install ... over Terminal in the Notebook Server session but also maybe install other scientific libraries which don't really support easily user installation.

https://github.com/jupyterhub/repo2docker/issues/487#issuecomment-698409277

The postBuildAdmin solution was suggested and briefly discussed w/ @manics in the #487 thread (cf. https://github.com/jupyterhub/repo2docker/issues/487#issuecomment-698473158 ).

In both this and ImageMagic cases it's not about change of the actual base image but about admin-only config.

trybik avatar Jan 29 '21 10:01 trybik

I think part of the issue is that repo2docker allows you to install operating system packages using apt, and since it's at the OS level that means it's often only configurable by root. As an aside, there's another future problem here- if the base image is upgraded from https://github.com/jupyterhub/repo2docker/blob/cb4621f2544efba496f6cf58f6684846d7c49643/repo2docker/buildpacks/base.py#L18 to the next LTS then since packages may have been dropped or undergone breaking changes this could result in binder repositories failing to build or run.

Compare this with most (maybe all?) of the other buildpacks which can be run as a non-admin.

manics avatar Jan 29 '21 15:01 manics

My view on using sudo hasn't changed from what Yuvi wrote in https://github.com/jupyterhub/repo2docker/issues/192#issuecomment-390367986

As manics points out: the more we allow people to interact with the base image/distro the harder it becomes to change it. At some point bionic won't be supported any more and we will have to change. All this makes me think the answer to "I want to use distro level packages" and the resulting "I need sudo to edit system level packages" should be to reduce the need/desire to do this. For example by encouraging the use of package managers that install things in user editable directories (conda, nix, homebrew for linux?). This means I'd be excited to explore what those options would look like, how much they are used, etc.

Right now my feeling is that needing sudo is rare and that most of those wanting to use sudo shouldn't be wanting to use it. For the rest it seems like they are doing advanced/custom/one off stuff and it is Ok to point them to use a Dockerfile. I use "seems like", "appears like" and "my feeling" because I don't have any data to point to and I don't know how we could get some either. My feeling comes from thinking about support requests/PRs I've seen.

betatim avatar Jan 31 '21 07:01 betatim

Right, in principle apt.txt falls into the same bucket as the proposed here postBuildAdmin. I guess the main incentive to use either of these, at least in these "advanced/custom/one off" solutions, would be to stay up to date with r2d-generated Dockerfile. In that sense, the potentially breaking changes, like LTS update, would be a desired part of the general toolchain migration task (incl. discovery of repos that need update as well).

Considering alternatives for being able to change live runtime env by user (except maybe for some hard-core sci packages, that anyway require sudo installations) the most obvious one, other than a custom Dockerfile, would be to always use conda, right? Even to the extent that conda is actually always seamlessly used in the r2d-generated Dockerfile. The downsides I can potentially see is that conda introduces noticeable overhead on both image size and build time (*), but I bet you've discussed this before. Could you weigh-in on that?

(*) possibly less of an issue with mamba

trybik avatar Feb 01 '21 08:02 trybik

One more remark: to solve an issue of mass breaking build of repos in central solution like mybinder.org on r2d update (with potentially build breaking changes), an option would be to add spec of r2d version to use for repo build. (This could also include allowing to use e.g. r2d forks, though it seems somehow a dangerous idea.)

trybik avatar Feb 01 '21 08:02 trybik

From a technical perspective, if we do support this feature, I think it should be through making sudo work in post-build, not adding a new file (as discussed in #192). That changes the discussion of the issue somewhat because it eliminates the 'new r2d-specific file' topic, and more to a capabilities discussion: should arbitrary root commands be allowed.

I'm +/- 0 at this point: I don't think there's a high cost to allowing it. Since we do allow custom Dockerfiles, no security argument can be made about it.

@trybik allowing request of a repo2docker version is absolutely a goal, see #490. Specifying a fork less so, and at least wouldn't be supported on mybinder.org for the reasons you mentioned.

minrk avatar Feb 03 '21 08:02 minrk

Could you weigh-in on that?

Speed and size costs of using conda definitely vary a lot. Mamba has no effect on size, but does significantly improve dependency resolution time. imagemagick is a bit of a worst case scenario because:

  • it has lots of dependencies
  • most of those dependencies are not in the base environment, so need to be downloaded
  • it is actually already on the system (I'm not sure why, this surprised me), so adding it to apt.txt costs ~nothing and ~all dependencies are redundant

As a result, on my laptop adding:

dependencies:
- imagemagick

took an extra 90 seconds to build, and increased image size from 2.1GB to 3.2GB, an increase of 50% for one package.

minrk avatar Feb 03 '21 09:02 minrk

@minrk that's great input, thanks

So for practical (mostly image size) reasons it seems like supporting apt.txt makes perfect sense (I'm all for it). Then, it would also make sense to allow sudo to be able to at least configure whatever is provided via apt-get; or, in more general whatever other system settings need to be adjusted (like /etc/sudoers as in our case). To both ends /etc write access should suffice, but I guess technically it's equivalent to allowing arbitrary root commands.

making sudo work in post-build

So e.g. add $NBUSER to /etc/sudoers, run postBuild, rm $NBUSER from /etc/sudoers? If the last one would be skip, then it would sort out our issue: user being able to modify system during runtime.

trybik avatar Feb 03 '21 11:02 trybik

Again, one more follow-up remark:

should arbitrary root commands be allowed

If to be allowed, IMVHO postBuildAdmin is the approach to take rather than making sudo work in postBuild.

It creates a (micro-scale) separation of concerns. Here, separating all potentially image-breaking specifications (image-breaking because of either not being future-proof wrt r2d changes, or by being just bad usage practices, where a clean non-sudo solution should be employed). It's easier to put a big warning sign on postBuildAdmin saying, as @manics nicely put it in the other thread:

post-build-admin has the same support level as Dockerfile, i.e. not much.

Note: such warning is currently missing from the PR, but there seem no to be an equivalent one on Dockerfile too (at least not in config_files.rst).

trybik avatar Feb 05 '21 02:02 trybik

separating all potentially image-breaking specifications

Given how much we install in the user-space, I don't think that's true. It's really easy to create a broken image with postBuild already. So it's more a question of which parts you can break, and there is a lot more that's relevant and required by r2d and breakable by the user than there is that requires root access.

post-build-admin has the same support level as Dockerfile, i.e. not much.

I'd say postBuild is already at this level of support. We've yet to make any real guarantees about what's available or going to be available in this scope, other than "it's run at the end of whatever else r2d does". It's very possible to break large chunks of functionality with conda install commands (or even pip).

minrk avatar Feb 08 '21 14:02 minrk

Based on the above discussion I'm in favour of a postBuildAdmin script:

  • It seems like we'll need OS level setup such as as apt.txt for the forseeable future, so it seems reasonable to allow a script to be run as root for configuring a system
  • A separate script is easier than supporting sudo inside postBuild. Using
     USER root
     RUN postBuildAdmin
     USER ${NB_USER}
    
    is cleaner than enabling then disabling sudo, especially since sudo isn't installed by default

manics avatar Feb 08 '21 16:02 manics

I still think that if this needs a new special repo2docker-only file, it's not a feature we should add.

is cleaner than enabling then disabling sudo

I also think we should not do that. If we allow sudo, it should be allowed, both in build and run. If we disallow it, it should be disallowed in both. That is:

  1. if sudo is allowed, add $NB_USER to sudoers
  2. post-build

that's all.

minrk avatar Feb 11 '21 19:02 minrk

It's really easy to create a broken image with postBuild already.

What I meant is "all potentially image-breaking specifications" on inevitable update or change of the base image. That's very likely also not true statement, although I guess the "almost all" affected specs would be in apt.txt or postBuildAdmin.

I still think that if this needs a new special repo2docker-only file, it's not a feature we should add. [...] If we allow sudo, it should be allowed, both in build and run.

It seems more safe to go about it in steps. If the postBuildAdmin does not work out in practice as a useful generalisation of apt.txt spec, you stay on the safe side and could deprecate it. Otherwise, you could deprecate apt.txt and take a step further by allowing sudo in postBuild (and runtime) and eventually deprecating also postBuildAdmin.

trybik avatar Feb 16 '21 14:02 trybik

Hey guys. Quick reminder to this PR. @betatim It is still controversial, I guess?

g-braeunlich avatar Mar 12 '21 08:03 g-braeunlich

@g-braeunlich Unfortunately most of us are lacking in time (as always....). Since there are differing views amongst the maintainers we'll need to come to a consensus. This should eventually happen, but if you're interested in pushing this forward we have monthly JupyterHub meetings in alternating timezones that are open to everyone.

The next one is on Thursday 18 March 17:00 UTC: https://github.com/jupyterhub/team-compass/issues/378 If you want to join please add your name and a point under the agenda!

manics avatar Mar 15 '21 17:03 manics

Hey @manics Thank you very much for the invitation! Unfortunately I wont be able to join. But maybe this will get sorted out somehow. If not this time, maybe the next time.

g-braeunlich avatar Mar 16 '21 06:03 g-braeunlich

Use at your own risk. I cherry picked @g-braeunlich PR into current master: jabberaa/repo2docker-admin:latest

jabbera avatar Mar 23 '21 23:03 jabbera

If anyone wants to talk about this at the next meeting it's on Thursday 15 April 08:00 UTC: https://github.com/jupyterhub/team-compass/issues/388

manics avatar Mar 24 '21 17:03 manics

5 am my time. Unless my kids are a total nightmare I'll just say that something like this is required and leave it at that.

jabbera avatar Mar 24 '21 22:03 jabbera

I'm likely going to fork this and allow for base image selection as well. The images that are being minted by repo2docker for our use case are enormous with no common layers. Having the ability to use our "stock" (docker-stack + special sauce) base images will greatly decrease pull/startup time as each user gets a dedicated node that comes with our base images.

jabbera avatar Mar 27 '21 01:03 jabbera

I'd be very interested in this feature. For instance nvidia drivers and for linking folders outside of the home dir.

ctr26 avatar Aug 12 '21 15:08 ctr26

To add any new repo2docker configuration file adds work for sure, but I think supporting postBuildAsRoot or similar would be easy to implement, document, test, and maintain over time, and be a powerful and useful escape hatch.

I've hesitated on using repo2docker because the current limitation. Both because of concern that I would end up needing the root escape hatch, and because I knew I had to compromise not installing something such as gcloud.

With this in mind, I'm in clear support of adding a new file like postBuild but that is run as root. If there is support for this, I'm volunteering to implement it with docs and tests etc.

With regards to the discussion of either adding a new file or installing sudo and granting sudo rights, I currently think of adding a new file as a more robust and less invasive option. A key reasons for me appreciating having a dedicated file is that inspecting a folder that repo2docker will build allows you to quickly see if something is to be done as root with a dedicated file.

If there is agreement on adding support for such file, I suggest postBuildAsRoot as a file name as I think that describes what it is quite well and also helps convey that postBuild may be as non-root normal user (jovyan). Thinking of how I would write the docs for this file, I'd want to write something like "This is like the postBuild file, but executed as the root user after postBuild has executed."

consideRatio avatar Jun 16 '23 17:06 consideRatio