repo2docker icon indicating copy to clipboard operation
repo2docker copied to clipboard

Make the base image configurable at build time

Open yuvipanda opened this issue 5 years ago • 18 comments
trafficstars

Requires #1161, as that removes all assumptions of what Ubuntu version we are using from the R buildpack.

This PR makes the base image configurable via a traitlet that can be passed in to the commandline when calling repo2docker. It keeps the current default (bionic), but open to making the default change in the future.

This is the first time we're allowing traitlets (other than .appendix) to directly influence the contents of the built image, definitely affecting reproducibility. I am not sure if there is a way around that, unfortunately.

yuvipanda avatar Jun 09 '20 05:06 yuvipanda

Can we use this PR to make not only the tag but the full base image name configurable via traitlet?

This has come up as an idea in #487 (I think) or a related issue about making the base image configurable. I think allowing the person running repo2docker to change this would be a nice feature in general and would allow us to run tests with the old and the new version.

The other bit of homework to do is finding the ticket/docs where we wrote how long we will use this base image/when we will update. So that we can be consistent with what we promised.

betatim avatar Jun 09 '20 10:06 betatim

I just rebased this. We hard code a bunch of packages - particularly in the R ecosystem - to bionic. I think making the base image configurable would mean we'll have to restrict it to one of two choices (bionic, focal) so the R buildpack knows which versions of things to use. But that seems like a lot of added complexity...

yuvipanda avatar Nov 29 '21 19:11 yuvipanda

Can we install R and rstudio from conda instead of depending on the underlying OS?

manics avatar Nov 29 '21 21:11 manics

@manics unfortunately we can't, as we'll then stop being able to install binary packages from packagemanager.rstudio.com

yuvipanda avatar Jun 03 '22 17:06 yuvipanda

Although, perhaps we can get it from https://docs.rstudio.com/resources/install-r/

yuvipanda avatar Jun 03 '22 17:06 yuvipanda

I'm trying out getting R from RStudio tho (https://github.com/jupyterhub/repo2docker/pull/1161) maybe that'll help remove anything distro specific

yuvipanda avatar Jun 03 '22 17:06 yuvipanda

Now based on #1161, and I'll try make the base image configurable as well once tests pass.

yuvipanda avatar Jun 03 '22 19:06 yuvipanda

tests pass, yay!

yuvipanda avatar Jun 22 '22 20:06 yuvipanda

When do you plan to merge this PR? Thank you @yuvipanda for this as we need to update base image because of public docker registry pull limits

mahluk avatar Jul 22 '22 13:07 mahluk

@mahluk slow going :) Hopefully the tests are fixed up now, and we can figure out how to run these tests.

yuvipanda avatar Jul 25 '22 19:07 yuvipanda

@minrk @betatim @manics @consideRatio would love some review on this one now - I think it's ready to go. This doesn't actually change any behavior, just makes that configurable. We can eventually parameterize the base image tests so they run on multiple ones, but I don't think that's needed in this PR

yuvipanda avatar Jul 25 '22 19:07 yuvipanda

@manics ok I think I've addressed all your suggestions!

yuvipanda avatar Jul 26 '22 18:07 yuvipanda

@manics I've added the base image requirements as help text to the traitlet. Do you think that's enough?

yuvipanda avatar Jul 26 '22 18:07 yuvipanda

Thanks, that covers the technical aspects. Is there anything we need to say about reproducibility? Does using a custom base image mean you're no longer compliant with https://repo2docker.readthedocs.io/en/latest/specification.html (e.g. different perhaps breaking versions of packages), and do we need to state this somewhere?

manics avatar Jul 27 '22 08:07 manics

@manics good question re: REES. I unfortunately don't have a clear answer there at all, as it's the first thing that's determined by a parameter you pass in, not something in the repo.

So REES is currently in reality a tuple of (r2d_version, repo_url, repo_ref) for reproducibility. In this case, it becomes (r2d_version, base_image, repo_url, repo_ref). Maybe that's a bit messy.

So I guess we can say something like "It is REES compliant only with these specific set of base images, and if you pass something else no guarantees"?

I don't like any of the options here.

yuvipanda avatar Aug 26 '22 07:08 yuvipanda

@jupyterhub/binder-team how do you think we should document/manage this change with regards to REES?

manics avatar Aug 26 '22 10:08 manics

I think there's a major question about who is in control of the base image. I think there's a decent reason for it to not come from the repo, but that also greatly limits the utility since requiring a specific base image makes any repo that does so incompatible with all Binder services (except for possibly one bespoke binderhub instance for that base image).

But repo2docker isn't just for binder - it can be used e.g. as a handy user-image generation tool for JupyterHub, etc. where such issues are less relevant, and if it works, it works. To me, putting the base image in the CLI is making it a feature not for the portable reproducibility case, but rather for the personal image automation case. The appendix is in a similar role - it was there for deployment-controlled image contents, not reproducibility-related.

I think selecting the base image is also going to have to be highly experimental, and only the default image really 'supported', at least for some time. I wouldn't want to make any guarantees that anything other than the default ubuntu or possibly a (not too much) more recent ubuntu would work, though plenty might.

If we want selecting base images to be part of a reproducibility strategy, I think we'd have to:

  1. get the base image from the repo, not the CLI
  2. be much clearer/stricter in our buildpacks in terms of assumptions on the filesystem, whether apt is even available, etc.

minrk avatar Aug 26 '22 10:08 minrk

TL;DR: what Min said.

I think marking "different base image" as a tool/option to use in similar vain as the appendix is a smart idea and probably covers most of the use cases? Though I am not sure I know what the use cases are :-/

I'd vote against getting the base image from the repo, and hence it can't be part of REES. The reason not to get it from the repo are that it would require us to invent a new file that is in the repo that stores the name of the base image and because using a custom base image (can) make it very difficult for a human to understand how the environment was made. At least for me the fact that a human can mostly read the docker build instructions that repo2docker generates and understand how to do the same in a different place/tool is useful. It kinda counteracts the super bespoke environments that no one can ever edit again 6months later because no one can comprehend all the dependencies between the different steps (see the "regex are write-only" meme).

I think if we want to have a deep discussion about REES or not and "is allowing this at all going to lead to repos that can only be built on a particular binderhub" etc we should list the top few use-cases and what users are trying to achieve as aprt of them. Otherwise we end up discussing "in a vacuum". I'm not sure if we need to have such a deep discussion though.

betatim avatar Aug 26 '22 12:08 betatim

Related

  • https://github.com/jupyterhub/repo2docker/issues/1206

Test failure

    @pytest.mark.parametrize("python_version", ["2.6", "3.0", "4.10", "3.99"])
    def test_unsupported_python(tmpdir, python_version):
        tmpdir.chdir()
>       bp = PythonBuildPack()
E       TypeError: __init__() missing 1 required positional argument: 'base_image'

consideRatio avatar Oct 31 '22 00:10 consideRatio

Hi, Please prioritize this PR or a similar one for giving an option to change the base image. After the Docker's rate limiting in place, we are having problems with our BinderHub instances, basically because the FROM buildpack-deps:bionic step started to fail. Thank you.

emrahkaya avatar Nov 24 '22 08:11 emrahkaya

Rebased again.

yuvipanda avatar Dec 10 '22 21:12 yuvipanda

I suggest that somebody summarize the major unanswered questions we need to answer for this PR to be merge able. We should then focus discussion on those things to make a decision. And if we can't come up with anything, we should just merge it unless somebody wants to do a quick implementation improvement pass or something.

choldgraf avatar Dec 11 '22 09:12 choldgraf

I've fixed almost all failing tests in https://github.com/yuvipanda/repo2docker/pull/6. Sadly there is still a problem with tests/r/r3.6-mran

kardasbart avatar Dec 11 '22 23:12 kardasbart

FYI I pushed a quick commit to add a page that documents this functionality in the sphinx site

choldgraf avatar Dec 13 '22 10:12 choldgraf

@choldgraf The main question is how this fits in with reproducibility, REES, and how we support or don't support it. See the discussion starting from https://github.com/jupyterhub/repo2docker/pull/909#issuecomment-1196392236

Since you've started a doc maybe you could also add this too, so if anyone asks about this feature we can point them to that doc for a definite answer?

manics avatar Dec 13 '22 10:12 manics

@manics what do you think about the latest commit?

choldgraf avatar Dec 13 '22 10:12 choldgraf

We investigated using this PR now that Ubuntu 18.04 is past its maintenance period. (although still within the extended security maintenance period). It successfully built the image for a jupyterhub deployment when I specified buildpack-deps:jammy.

~~I did not investigate how the R environment was affected by repo2docker's hardcoding of https://packagemanager.rstudio.com/all/linux/bionic/. We wouldn't use it in production for images that rely on R when the R package manager is not matched to the distro.~~ (Yuvi fixed this in a separate PR.)

Since repo2docker is locked into Ubuntu Bionic right now for non-Dockerfile builds, would it be more palatable to permit people to change just the Ubuntu version, and let that update the RStudio package manager URL?

Configuration via runtime.txt would make this more explicit, e.g. ubuntu-<codename> to mimic python-x.y or r-<RVERSION>.... "runtime" makes me think it is not "build time", although this file does determine how the image is built. Using runtime.txt would mean that you wouldn't have to invent a new file.

ryanlovett avatar May 18 '23 18:05 ryanlovett

I think outstanding issues are addressed via the docs. To me, making it configurable from the CLI lets repo2docker be configured and there are no changes for the reproducibility of repos via services like binder, where the base image doesn't enter into it.

I think we do also need to bump the default distro used in a separate PR, because 18.04 is super old now.

minrk avatar May 22 '23 12:05 minrk

Yay, would love to get this merged before it hits the 3 year anniversary of June 9 :D

yuvipanda avatar May 22 '23 12:05 yuvipanda

@minrk aaah, the R test might still be failing because MRAN doesn't work on newer ubuntu?

yuvipanda avatar May 22 '23 12:05 yuvipanda