repo2docker
repo2docker copied to clipboard
[WIP] Pull the image given in CACHE_FROM argument
Summary
Write a function that takes in a list of docker images that are desired cache sources for the build phase. Check each image name for a provided tag, if not found then default to 'latest'. Pull each image from Docker Hub, they then will be available locally to be used as a cache during the build phase.
This was the behaviour I expected to see when using the CACHE_FROM flag, but it appears that this is not the case. Issue #130
I have written a function that consumes the list of images, checks for tags and then pull them. I'm not sure where to put this within the repo2docker codebase though, so any guidance on that would be super appreciated ✨
It may also be useful to combine with the find_image
function so we don't pull what we already have? https://github.com/jupyter/repo2docker/blob/bbc3ee02c0755b15ea456f9ae18dd76b904568e7/repo2docker/app.py#L613-L625
Outstanding TODOs
- [ ] Move the code snippet to an appropriate place with the repo2docker codebase
- [ ] Swap print statements for the custom logger
- [ ] We may also want to do some handling of the pull output, like in
push_image
https://github.com/jupyter/repo2docker/blob/bbc3ee02c0755b15ea456f9ae18dd76b904568e7/repo2docker/app.py#L458-L499
This looks nice!
Not super sure where we could put things. Like you say it is related to find_image
and push_image
. Adding the new method to the app class is probably easiest. However that file (and class) is huge already. Maybe it is time to move these things out to a "docker utilities" file. From a quick read these three methods all create a docker client, take a (bunch of) strings as inputs and produce some log output.
If we can make them three separate functions that take their inputs as arguments and then do their thing that would be nice I think. It would also make them easier to test because you don't need to setup the whole app first.
@betatim would https://github.com/jupyter/repo2docker/pull/848 help since it moves the Docker API calls to a separate class?
I'll take a look at #848, I had lost track of that PR
Just following up on this and/or #848 :)
I have written a function that consumes the list of images, checks for tags and then pull them. I'm not sure where to put this within the repo2docker codebase though, so any guidance on that would be super appreciated sparkles
I agree its not so intuitive that --cache-from
doesn't pull for you, I've run into this as well!
It is a passthrough option to the docker build
command though (or for whatever --engine used that could possibly accept it). I think that for repo2docker to add a functionality on top of a passthrough option is to add more complexity than we can sustainable maintain in repo2docker atm.
I'll go for a close on this for now to help triage PRs in this project, please don't see that as a final decision or similar!
Update - Did we all misunderstand --cache-from
?
Oh actually, I think if --cache-from
is specified, it means something entirely different than we all have been thinking. Check out https://docs.docker.com/engine/reference/commandline/build/#specifying-external-cache-sources.
I think it can be relevant if you have built an image like this, where you have a --mount=type=cache
, and have built the image with BUILDKIT_INLINE_CACHE=1
.
FROM python:3.9-bullseye
# install wheels built in the build-stage
COPY requirements.txt /tmp/requirements.txt
ARG PIP_CACHE_DIR=/tmp/pip-cache
RUN --mount=type=cache,target=${PIP_CACHE_DIR} \
--mount=type=cache,from=build-stage,source=/tmp/wheels,target=/tmp/wheels \
pip install \
--find-links=/tmp/wheels/ \
-r /tmp/requirements.txt