repo2docker icon indicating copy to clipboard operation
repo2docker copied to clipboard

[WIP] Pull the image given in CACHE_FROM argument

Open sgibson91 opened this issue 4 years ago • 4 comments

Summary

Write a function that takes in a list of docker images that are desired cache sources for the build phase. Check each image name for a provided tag, if not found then default to 'latest'. Pull each image from Docker Hub, they then will be available locally to be used as a cache during the build phase.

This was the behaviour I expected to see when using the CACHE_FROM flag, but it appears that this is not the case. Issue #130

I have written a function that consumes the list of images, checks for tags and then pull them. I'm not sure where to put this within the repo2docker codebase though, so any guidance on that would be super appreciated ✨

It may also be useful to combine with the find_image function so we don't pull what we already have? https://github.com/jupyter/repo2docker/blob/bbc3ee02c0755b15ea456f9ae18dd76b904568e7/repo2docker/app.py#L613-L625

Outstanding TODOs

  • [ ] Move the code snippet to an appropriate place with the repo2docker codebase
  • [ ] Swap print statements for the custom logger
  • [ ] We may also want to do some handling of the pull output, like in push_image

https://github.com/jupyter/repo2docker/blob/bbc3ee02c0755b15ea456f9ae18dd76b904568e7/repo2docker/app.py#L458-L499

sgibson91 avatar Apr 28 '20 09:04 sgibson91

This looks nice!

Not super sure where we could put things. Like you say it is related to find_image and push_image. Adding the new method to the app class is probably easiest. However that file (and class) is huge already. Maybe it is time to move these things out to a "docker utilities" file. From a quick read these three methods all create a docker client, take a (bunch of) strings as inputs and produce some log output.

If we can make them three separate functions that take their inputs as arguments and then do their thing that would be nice I think. It would also make them easier to test because you don't need to setup the whole app first.

betatim avatar Apr 29 '20 05:04 betatim

@betatim would https://github.com/jupyter/repo2docker/pull/848 help since it moves the Docker API calls to a separate class?

manics avatar Apr 29 '20 06:04 manics

I'll take a look at #848, I had lost track of that PR

betatim avatar Apr 29 '20 14:04 betatim

Just following up on this and/or #848 :)

sgibson91 avatar Jun 15 '20 09:06 sgibson91

I have written a function that consumes the list of images, checks for tags and then pull them. I'm not sure where to put this within the repo2docker codebase though, so any guidance on that would be super appreciated sparkles

I agree its not so intuitive that --cache-from doesn't pull for you, I've run into this as well!

It is a passthrough option to the docker build command though (or for whatever --engine used that could possibly accept it). I think that for repo2docker to add a functionality on top of a passthrough option is to add more complexity than we can sustainable maintain in repo2docker atm.

I'll go for a close on this for now to help triage PRs in this project, please don't see that as a final decision or similar!

Update - Did we all misunderstand --cache-from?

Oh actually, I think if --cache-from is specified, it means something entirely different than we all have been thinking. Check out https://docs.docker.com/engine/reference/commandline/build/#specifying-external-cache-sources.

I think it can be relevant if you have built an image like this, where you have a --mount=type=cache, and have built the image with BUILDKIT_INLINE_CACHE=1.

FROM python:3.9-bullseye

# install wheels built in the build-stage
COPY requirements.txt /tmp/requirements.txt
ARG PIP_CACHE_DIR=/tmp/pip-cache
RUN --mount=type=cache,target=${PIP_CACHE_DIR} \
    --mount=type=cache,from=build-stage,source=/tmp/wheels,target=/tmp/wheels \
    pip install \
        --find-links=/tmp/wheels/ \
        -r /tmp/requirements.txt

consideRatio avatar Oct 30 '22 23:10 consideRatio