kaniko
kaniko copied to clipboard
Mount cache directory inside build container
Trying to figure out some way of sharing cache between builds, though about mounting some directory like /cache inside the build container, so we can then have a shared cache there for things like pip, npm or cargo.
Will be possible to implement something like that?
We do have a cache warmer feature for caching base layers. https://github.com/GoogleContainerTools/kaniko#caching
The intermediate layers are caches remotely. Are you looking for caching layers locally?
Would that help?
Layer cache is invalidated on any change so wanted to mount shared directory and keep for example python pip cache there, so it doesn't require to fetch that from network even one dependency has changed.
Something like -v for docker to mount a directory/image (which is also not supported for docker build
)
@urbaniak going on the example of pip caching (which I don't know a ton about) I'm assuming pip looks for certain directories to check if a cache already exists. I would imagine you could just mount a volume at that directory into the kaniko docker container.
IIRC kaniko has some special handling for mounted volumes so I don't think it would cause issues for the image build if those cache files aren't directly referenced by the docker build, but I'm not positive.
In any case, I see no reason why it shouldn't work so if it doesn't we can certainly look into it
I second this need, what you explained @cvgw is correct:
- pip, maven, npm, yarn and so on have cache directories for downloaded packages
- usually one would instruct the CI to cache such directory so that it can be reused between builds
When using kaniko, the build itself has no access to the host filesystem, so it would be neat to be able to instruct kaniko to mount a certain directory into the build container so that pip and co can take advantage of it.
What about supporting RUN --mount=type=cache?
@glen-84 I'm not exactly sure how this feature interact with the host filesystem? Because the use case I had in mind take advantage of the fact that the host filesystem (i.e., the CI machine) will have a directory with the cache available. I'm not sure if --mount=type=cache
allows for this, does it?
@victornoel,
If I understand correctly, the Docker engine on the host would manage the directory (sort of like a named volume). If you follow the link, they show an example of caching Go packages.
See also https://stackoverflow.com/a/57934504/221528.
(PS. You might want to consider up-voting this issue to indicate your interest.)
@glen-84 ok, so in the case of kaniko, since there are no docker engine on the host, I suspect it would mean adding an option to the kaniko cli to specify which directory to use when --mount=type=cache
is used inside the image. This would elegantly allow to choose the desired directory in a CI context.
Still there would be some thinking to do as to how this interacts with the from
and source
option I suppose...
My thoughts were to "reuse" the syntax, as opposed to designing something completely specific to Kaniko. Kaniko could be updated to understand these types of mounts, and to manage the directories outside of the image (i.e. inside the Kaniko container).
@glen-84 yes, that was my point too :)
podman
has support for this using the well known -v
flag:
podman build -v /tmp/cache:/cache .
Using this with ENV COMPOSER_CACHE_DIR=/cache/composer_cache
in PHP and ENV YARN_CACHE_FOLDER=/cache/yarn_cache
saves us a ton of time and bandwidth.
I would love to see this supported in kaniko.
I've indicated my interest.
Locally we build using Docker (it is just easier, sorry), for use in a e2e test using Docker Compose. Using --mount=type=cache
speeds up incremental Golang builds enormously.
Contrarily, on CI/CD we build using Kaniko before we ship. However, if our Dockerfile specifies this experimental syntax, it breaks the Kaniko build:
error building image: parsing dockerfile: Dockerfile parse error line 12: Unknown flag: mount
Therefore, to use BuildKit, I need to maintain 2 separate Dockerfiles, one for Kaniko and one for Docker BuildKit. This is cumbersome. A great first step would be if Kaniko didn't choke on the syntax, and adding support would be even greater!
At our project, we're also interested in this.
We have a similar scenario as @hermanbanken: we build using Docker locally and build with Kaniko before we ship, in our CI/CD pipeline.
In our case, however, we use buildkit
for the --secret
feature, instead of cache
(Dockerfile at the bottom of the comment). We need that in order to to pass secret information (sensitive credentials) to be used in the Dockerfile for building docker images (more specifically, download our private packages) in a safe way that will not end up stored in the final image.
As @hermanbanken said, It'd be great if Kaniko didn't choke on the syntax -- also suggested in #1568 (comment) -- and even better if it supported it. This is a very relevant issue for us and I can imagine that there are a lot more scenarios that would benefit from this :smile:
Dockerfile
# syntax=docker/dockerfile:1.2
FROM python:3.8-slim
# ...
COPY requirements_private.txt .
RUN --mount=type=secret,id=secrets \
export $(cat /run/secrets/secrets) && \
pip install --no-cache-dir --quiet --no-deps -r requirements_private.txt
# ...
also interested in this feature, that's what's missing for me to move from buildkit to kaniko atm
Also interested in this feature, this will speed up our builds
This is a must have feature. At minimum it should not err when buildkit flags are present. Right now i use a step that runs sed on the dockerfile to remove those options but I can not do this from skaffold (skaffold generates a single step).
Hey folks, kaniko is in maintenance mode and I am not actively working on it. That said, I am open for to help folks get this feature in.
im also very interested in this feature.
Mounting a local directory as a volume inside Kaniko is a valid use case and can be achieved with a little bit of overhead.
In Google Cloud Build the working directory is already automatically mounted in the Kaniko (and any other container) under /workspace
see here, you can reference directly the /workspace
directory and access your cache.
If you want to mount additional directories (e.g. because you want the cache in a specific location) you can always mount a volume in Kaniko, for example with:
- name: 'gcr.io/kaniko-project/executor:latest'
args: [...]
volumes:
- name: 'build_cache'
path: '/my_app/build_cache'
at this point we just need to pre-populate the volume with the cache content, for example downloading it from a bucket:
- name: gcr.io/cloud-builders/gsutil
args: ['-m', 'cp', '-r', 'gs://my-bucket/build_cache/*', '/build_cache']
volumes:
- name: 'build_cache'
path: '/build_cache'
or from a local directory:
- name: bash
args: ['cp', '-a', '/workspace/build_cache/.', '/build_cache']
volumes:
- name: 'build_cache'
path: '/build_cache'
Hope this helps until Cloud Build will allow to mount a local directory as a volume (maybe it already does, but I wasn't able to find a way).
Hey folks, kaniko is in maintenance mode and I am not actively working on it. That said, I am open for to help folks get this feature in.
@tejal29 I'm a bit motivated to do something regarding this but I'm a bit lost in the code base. I had a quick look around an my naive thought is that one would could just remove the mount syntax from the parsed command for now in pkg/commands/run.go. How close or wrong am I?
FYI Buildah >= 1.24 (which is shipped with Podman >= 4) supports RUN --mount=type=cache.
Is there a solution out for this? Looks like it's been a year since the last commenter, but I can't imagine why there wouldn't be support for cache mounts?
https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/reference.md#run---mounttypecache
I was able to get around this limitation using ONBUILD
in the Dockerfile. This example is creating and using a cache for Poetry inside the container and then copying it into the GitLab project root so it can be uploaded to the remote cache. Subsequent builds pull it down to the project root so it can be uploaded as part of the docker context and copied into the container. By default it won't use the cache so it doesn't break when running locally and in GitLab I pass in BUILD_ENV=ci as a build arg which causes it to copy the cached directory into the container.
ARG BUILD_ENV=local
# Install poetry and setup environment
# This is done in an upstream stage because the ONBUILD COPY will be inserted directly after
# the FROM of the downstream build and we don't want to have to re-install poetry every time
# the cache is downloaded (every build).
# see https://docs.docker.com/engine/reference/builder/#onbuild
FROM python:3.7-slim-bullseye as poetry
ENV POETRY_VERSION=1.5.0
RUN pip install poetry==${POETRY_VERSION}
ENV VIRTUAL_ENV /venv
RUN python -m venv ${VIRTUAL_ENV}
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
ENV POETRY_NO_INTERACTION=true \
POETRY_VIRTUALENVS_IN_PROJECT=false \
POETRY_VIRTUALENVS_PATH=${VIRTUAL_ENV} \
POETRY_VIRTUALENVS_CREATE=false
# If running on CI, copy the poetry cache from the GitLab project root to the container
FROM poetry as poetry_ci
ONBUILD COPY .cache/pypoetry /root/.cache/pypoetry/
# If running on local, don't do anything
FROM poetry as poetry_local
# Install the project
FROM poetry_${BUILD_ENV} as venv
COPY pyproject.toml poetry.lock ./
RUN touch README.md && \
poetry install --only main
# Build final image
FROM python:3.7-slim-bullseye as final
ENV PATH="/venv/bin:${PATH}"
COPY --from=venv /venv /venv
# Copy in the app, set user, entrypoint, etc
In GitLab:
build:
stage: build
image:
name: gcr.io/kaniko-project/executor:debug
entrypoint: [""]
variables:
POETRY_CACHE_DIR: /root/.cache/pypoetry
PROJECT_POETRY_CACHE_DIR: ${CI_PROJECT_DIR}/.cache/pypoetry
cache:
- key: ${CI_JOB_NAME}
paths:
- ${PROJECT_POETRY_CACHE_DIR}
script:
- mkdir -p ${PROJECT_POETRY_CACHE_DIR}
- /kaniko/executor --skip-unused-stages=true --cache=true --context ${CI_PROJECT_DIR} --dockerfile ${CI_PROJECT_DIR}/Dockerfile --destination <destination> --build-arg BUILD_ENV=ci --ignore-path ${POETRY_CACHE_DIR}
- rm -rf ${PROJECT_POETRY_CACHE_DIR}
- cp -a ${POETRY_CACHE_DIR} ${PROJECT_POETRY_CACHE_DIR}
The --ignore-path
is needed if you use multi-stage builds and the path you want to cache isn't in the final target that's built. If you're not using multi-stage builds, I don't think you need to pass that since the path will be available in the filesystem after kaniko has finished running but then you'd have the cache in your image which you probably don't want. The files are then copied to the project root which is where it needs to be for GitLab to be able to cache it.
Another alternative that we are using for per-process caching which doesn't require the cache to be filled in advance (however it does not work for the first install of each project, but we are ok with that, it can be mixed with @trevorlauder approach).
We just store somewhere the last image built (SSM parameters) for each project. And then using multistage copy we just copy the cache folder (poetry in our case) from the latest-known valid image (in our case we just copy the venv to speed it up even more).
PREVIOUS_IMAGE can be scratch or the "name" of the previous built image for that project.
FROM --platform=linux/amd64 $PREVIOUS_IMAGE as previous_image
RUN mkdir -p /project/.venv
COPY .aws /root/.aws
COPY . /project_new
FROM --platform=linux/amd64 $PYTHON_BASE_IMAGE as base_image
COPY --from=previous_image /root/.aws /root/.aws
COPY --from=previous_image /project_new /project
# Restore the project folder from the previous image to get its venv
COPY --from=previous_image /project/.venv /project/.venv
Need support for -mount=type=cache
as well
Any update on this?
This is a must needed feature still missing, which is making it hard to choose Kaniko.
I don't think we actually need to copy the cache folder explicitly as in @trevorlauder's approach above. Two realizations were key for me:
- Since the
executor
binary is simply running as a process inside the host container without any redirection, it has, by default, access to every folder in the host container. AFAIU the/kaniko
folder is only special in the sense that it is not added to the final image. But you are free to create any additional folders in the host container for the build and then discard them from the final image with--ignore-path
- kaniko automatically discards mounted folders on the host from the final image (via
DetectFilesystemIgnoreList
). In the case of Gitlab CI, thecache:
directive mounts the cache into the provided subfolder within${CI_PROJECT_DIR}
, so we don't even need to explicitly--ignore-path
it.
All in all, to make it work, just tell Gitlab CI to mount the cache as usual into the host container, then use it inside the Dockerfile during build (I am using build args to pass the cache path). It will be accessible because of 1., and it will not go into the final image because of 2. If it is read-write (default), the Gitlab-managed cache will also automatically be updated afterwards.
Here is my stripped-down config for employing a pip cache during kaniko build within Gitlab CI.
gitlab-ci.yml
build_and_publish_container:
stage: build
image:
name: gcr.io/kaniko-project/executor:v1.23.1-debug
entrypoint: [""]
cache:
paths:
- .cache/pip
script:
# build and push container, pass cache folder as build arg
- /kaniko/executor
--context "${CI_PROJECT_DIR}"
--build-arg "PIP_CACHE_DIR=${CI_PROJECT_DIR}/.cache/pip"
--destination "${CI_REGISTRY_IMAGE}:latest"
Dockerfile
FROM python:3.12-slim
# Path to pip cache on host container
ARG PIP_CACHE_DIR
COPY requirements.txt /app/
WORKDIR /app
RUN pip install -r requirements.txt
For compatibility with local docker builds, I believe one could add --mount=type=cache
to the RUN pip install
command - it will be ignored by kaniko. (untested)
Thanks to @pe224 i followed his approach and it worked.
Let me add one more thing when you source code is located in the root of ${CI_PROJECT_DIR} for gitlab and you also have a test job for instance before the build job and you want m2 cache to be shared between jobs.
if you do something like this for a java maven project
ARG CI_PROJECT_DIR
COPY . /app
WORKDIR /app
RUN mvn clean package -Dmaven.repo.local=${CI_PROJECT_DIR}/.m2/repository
Then you will also copy the m2 cache in a docker layer and not use the host one this means it will not be updated on the host and not cached by gitlab runner.
What i did on my side is move my code to code folder and do this instead
ARG CI_PROJECT_DIR
COPY code/ /app
WORKDIR /app
RUN mvn clean package -Dmaven.repo.local=${CI_PROJECT_DIR}/.m2/repository
if you don't want to have code folder i assume you can still do something like this
ARG CI_PROJECT_DIR
COPY pom.xml settings.xml /app
COPY src/ /app/src
WORKDIR /app
RUN mvn clean package -Dmaven.repo.local=${CI_PROJECT_DIR}/.m2/repository
This way the cache stays on the host and will be updated during the build.