repo2docker
repo2docker copied to clipboard
Make the base image configurable at build time
Requires #1161, as that removes all assumptions of what Ubuntu version we are using from the R buildpack.
This PR makes the base image configurable via a traitlet that can be passed in to the commandline when calling repo2docker. It keeps the current default (bionic), but open to making the default change in the future.
This is the first time we're allowing traitlets (other than .appendix) to directly influence the contents of the built image, definitely affecting reproducibility. I am not sure if there is a way around that, unfortunately.
Can we use this PR to make not only the tag but the full base image name configurable via traitlet?
This has come up as an idea in #487 (I think) or a related issue about making the base image configurable. I think allowing the person running repo2docker to change this would be a nice feature in general and would allow us to run tests with the old and the new version.
The other bit of homework to do is finding the ticket/docs where we wrote how long we will use this base image/when we will update. So that we can be consistent with what we promised.
I just rebased this. We hard code a bunch of packages - particularly in the R ecosystem - to bionic. I think making the base image configurable would mean we'll have to restrict it to one of two choices (bionic, focal) so the R buildpack knows which versions of things to use. But that seems like a lot of added complexity...
Can we install R and rstudio from conda instead of depending on the underlying OS?
@manics unfortunately we can't, as we'll then stop being able to install binary packages from packagemanager.rstudio.com
Although, perhaps we can get it from https://docs.rstudio.com/resources/install-r/
I'm trying out getting R from RStudio tho (https://github.com/jupyterhub/repo2docker/pull/1161) maybe that'll help remove anything distro specific
Now based on #1161, and I'll try make the base image configurable as well once tests pass.
tests pass, yay!
When do you plan to merge this PR? Thank you @yuvipanda for this as we need to update base image because of public docker registry pull limits
@mahluk slow going :) Hopefully the tests are fixed up now, and we can figure out how to run these tests.
@minrk @betatim @manics @consideRatio would love some review on this one now - I think it's ready to go. This doesn't actually change any behavior, just makes that configurable. We can eventually parameterize the base image tests so they run on multiple ones, but I don't think that's needed in this PR
@manics ok I think I've addressed all your suggestions!
@manics I've added the base image requirements as help text to the traitlet. Do you think that's enough?
Thanks, that covers the technical aspects. Is there anything we need to say about reproducibility? Does using a custom base image mean you're no longer compliant with https://repo2docker.readthedocs.io/en/latest/specification.html (e.g. different perhaps breaking versions of packages), and do we need to state this somewhere?
@manics good question re: REES. I unfortunately don't have a clear answer there at all, as it's the first thing that's determined by a parameter you pass in, not something in the repo.
So REES is currently in reality a tuple of (r2d_version, repo_url, repo_ref) for reproducibility. In this case, it becomes (r2d_version, base_image, repo_url, repo_ref). Maybe that's a bit messy.
So I guess we can say something like "It is REES compliant only with these specific set of base images, and if you pass something else no guarantees"?
I don't like any of the options here.
@jupyterhub/binder-team how do you think we should document/manage this change with regards to REES?
I think there's a major question about who is in control of the base image. I think there's a decent reason for it to not come from the repo, but that also greatly limits the utility since requiring a specific base image makes any repo that does so incompatible with all Binder services (except for possibly one bespoke binderhub instance for that base image).
But repo2docker isn't just for binder - it can be used e.g. as a handy user-image generation tool for JupyterHub, etc. where such issues are less relevant, and if it works, it works. To me, putting the base image in the CLI is making it a feature not for the portable reproducibility case, but rather for the personal image automation case. The appendix is in a similar role - it was there for deployment-controlled image contents, not reproducibility-related.
I think selecting the base image is also going to have to be highly experimental, and only the default image really 'supported', at least for some time. I wouldn't want to make any guarantees that anything other than the default ubuntu or possibly a (not too much) more recent ubuntu would work, though plenty might.
If we want selecting base images to be part of a reproducibility strategy, I think we'd have to:
- get the base image from the repo, not the CLI
- be much clearer/stricter in our buildpacks in terms of assumptions on the filesystem, whether apt is even available, etc.
TL;DR: what Min said.
I think marking "different base image" as a tool/option to use in similar vain as the appendix is a smart idea and probably covers most of the use cases? Though I am not sure I know what the use cases are :-/
I'd vote against getting the base image from the repo, and hence it can't be part of REES. The reason not to get it from the repo are that it would require us to invent a new file that is in the repo that stores the name of the base image and because using a custom base image (can) make it very difficult for a human to understand how the environment was made. At least for me the fact that a human can mostly read the docker build instructions that repo2docker generates and understand how to do the same in a different place/tool is useful. It kinda counteracts the super bespoke environments that no one can ever edit again 6months later because no one can comprehend all the dependencies between the different steps (see the "regex are write-only" meme).
I think if we want to have a deep discussion about REES or not and "is allowing this at all going to lead to repos that can only be built on a particular binderhub" etc we should list the top few use-cases and what users are trying to achieve as aprt of them. Otherwise we end up discussing "in a vacuum". I'm not sure if we need to have such a deep discussion though.
Related
- https://github.com/jupyterhub/repo2docker/issues/1206
Test failure
@pytest.mark.parametrize("python_version", ["2.6", "3.0", "4.10", "3.99"])
def test_unsupported_python(tmpdir, python_version):
tmpdir.chdir()
> bp = PythonBuildPack()
E TypeError: __init__() missing 1 required positional argument: 'base_image'
Hi,
Please prioritize this PR or a similar one for giving an option to change the base image. After the Docker's rate limiting in place, we are having problems with our BinderHub instances, basically because the FROM buildpack-deps:bionic step started to fail.
Thank you.
Rebased again.
I suggest that somebody summarize the major unanswered questions we need to answer for this PR to be merge able. We should then focus discussion on those things to make a decision. And if we can't come up with anything, we should just merge it unless somebody wants to do a quick implementation improvement pass or something.
I've fixed almost all failing tests in https://github.com/yuvipanda/repo2docker/pull/6. Sadly there is still a problem with tests/r/r3.6-mran
FYI I pushed a quick commit to add a page that documents this functionality in the sphinx site
@choldgraf The main question is how this fits in with reproducibility, REES, and how we support or don't support it. See the discussion starting from https://github.com/jupyterhub/repo2docker/pull/909#issuecomment-1196392236
Since you've started a doc maybe you could also add this too, so if anyone asks about this feature we can point them to that doc for a definite answer?
@manics what do you think about the latest commit?
We investigated using this PR now that Ubuntu 18.04 is past its maintenance period. (although still within the extended security maintenance period). It successfully built the image for a jupyterhub deployment when I specified buildpack-deps:jammy.
~~I did not investigate how the R environment was affected by repo2docker's hardcoding of https://packagemanager.rstudio.com/all/linux/bionic/. We wouldn't use it in production for images that rely on R when the R package manager is not matched to the distro.~~ (Yuvi fixed this in a separate PR.)
Since repo2docker is locked into Ubuntu Bionic right now for non-Dockerfile builds, would it be more palatable to permit people to change just the Ubuntu version, and let that update the RStudio package manager URL?
Configuration via runtime.txt would make this more explicit, e.g. ubuntu-<codename> to mimic python-x.y or r-<RVERSION>.... "runtime" makes me think it is not "build time", although this file does determine how the image is built. Using runtime.txt would mean that you wouldn't have to invent a new file.
I think outstanding issues are addressed via the docs. To me, making it configurable from the CLI lets repo2docker be configured and there are no changes for the reproducibility of repos via services like binder, where the base image doesn't enter into it.
I think we do also need to bump the default distro used in a separate PR, because 18.04 is super old now.
Yay, would love to get this merged before it hits the 3 year anniversary of June 9 :D
@minrk aaah, the R test might still be failing because MRAN doesn't work on newer ubuntu?