kaniko icon indicating copy to clipboard operation
kaniko copied to clipboard

Kaniko build's performance much slower comparing with DID solution

Open caiwei-ebay opened this issue 5 years ago • 34 comments

We have a very simple Dockerfile which inherits a ubuntu jdk 8 image, run a few shell commands and copy a few files. Please note the RUN commands comes at the very first.

Our CI is built on top of Kubernetes, the Jenkins build will be run in a slave pod. We've enabled DID & Kaniko in separate slave images and trigger the builds with Kaniko and Docker. Here is the performance result of building & pushing images we've observed:

Dockerfile by removing all RUN commands:

  • Kaniko: 67s
  • DID: 58s

Dockerfile having 10 RUN commands:

  • Kaniko: 180s
  • Docker in Docker: 89s

May I know why Kaniko is so much slower than DID solution if there are RUN commands in Dockerfile? Can this part speed up?

We've tried the --cache & the --cache-repo parameters, the performance of Kaniko build did not improve at all. Here is the details:

  • We are using a internal Docker registry based on Quay.io
  • We passed --cache=true only and get the error "NAME_INVALID: Nested repositories are not supported."
  • We passed --cache=true & --cache-repo=ANOTHER_REPO, we saw cache uploaded in the 1st build. We did not modify any code and trigger build again, this time we saw a few cache hits.

However the performance is much worse with cache, taking 254s. I think the cache uploading or downloading is also a time killer.

Please help explain the cache issue and advice how we can further improve the performance for Kaniko build.

The Dockerfile we used likes below:


FROM abc COPY *.jar /app/app.jar

RUN jar -xvf app.jar &&
rm -rf app.jar &&
mkdir -p /layer_build/lib/snapshots &&
mkdir -p /layer_build/lib/releases &&
mkdir -p /layer_build/app &&
find BOOT-INF/lib -name 'SNAPSHOT' -type f -exec mv {} /layer_build/lib/snapshots ; &&
mv BOOT-INF/lib/* /layer_build/lib/releases &&
rm -rf BOOT-INF/lib &&
mv * /layer_build/app

FROM def COPY --from=0 layer_build/lib/snapshots/ /app/BOOT-INF/lib/ COPY --from=0 layer_build/lib/releases/ /app/BOOT-INF/lib/ COPY --from=0 layer_build/app/ /app/

WORKDIR /app CMD ["/bin/bash", "-c", "/app/bin/run.sh"]


caiwei-ebay avatar Nov 22 '19 02:11 caiwei-ebay

I've noticed similar issues - I use GitLab runner on Kubernetes, and in the same way as you described, ran dind and kaniko at the same time, kaniko is much slower. At the moment I've switched to using kaniko on Cloud Build, and there its pretty fast and caches better than docker.

mcfedr avatar Nov 25 '19 14:11 mcfedr

kaniko on Cloud Build

Thanks for the information, I believe you are talking about https://cloud.google.com/blog/products/application-development/build-containers-faster-with-cloud-build-with-kaniko.

Unfortunately we are using an internal docker registry based on quay.io, so it cannot benefit us. The cache uploading & downloading with quay takes much more time than without using cache as we observed.

caiwei-ebay avatar Nov 26 '19 01:11 caiwei-ebay

It seems a lot of time is spent snapshotting the filesystem, which I believe is used to ensure we get an end result with multiple layers.

By using --single-snapshot there will only be a single layer added to the base image, and I assume we won't slow down making intermediary snapshots for intermediary layers.

It can of course be nice to have layers, so improving performance like this is a compromise. I ended up with 15 minutes instead of 25 minutes for one of my builds.

consideRatio avatar Jan 29 '20 12:01 consideRatio

have the same question. in jenkins used dind faster than Kaniko . most of the time spent [Taking snapshot of full filesystem] [Unpacking rootfs as cmd COPY] how to improve this?

u2bo avatar Feb 28 '20 02:02 u2bo

most of the time spent [Taking snapshot of full filesystem] [Unpacking rootfs as cmd COPY] how to improve this?

I have the same question. I tried kaniko build on gitlab and it's also slower than with docker.

klkl0808 avatar Mar 03 '20 20:03 klkl0808

Same here. Trying to improve the build of https://beta.kintohub.com/ by transitioning from DiD to Kanico but DiD is faster, even with caching. Seems that most of the time is indeed spent in [Taking snapshot of full filesystem] [Unpacking rootfs as cmd COPY]

bakayolo avatar Mar 18 '20 09:03 bakayolo

Experiencing the same issue. In fact I don't see any difference in runtimes when using --cache=true... it definitely pulls cached layers, but it does not speed up the builds at all.

haampie avatar Apr 01 '20 21:04 haampie

I'm using kaniko in GitLab CI/CD with runners in a DigitalOcean Kubernetes cluster (3x 2GB 1vCPU).

Benchmark: create-react-app (multi-stage build)

FROM node:12-alpine as build
WORKDIR /home/app/
COPY package.json ./
COPY yarn.lock ./
RUN yarn 
COPY . .
RUN yarn build

FROM nginx:1.13.12-alpine
COPY --from=build /home/app/build /var/www
COPY nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Building locally with docker build on my laptop: ~ 2 minutes

Building with kaniko in a GitLab runner: ~ 38 minutes It spends most of the time (~32 minutes) on the "Taking snapshot of full filesystem..." step.

Same as previous with --single-snapshot: ~ 33 minutes

Using Docker in Docker: ~5 minutes

bergkvist avatar Apr 18 '20 10:04 bergkvist

We've been experiencing similar problems with kaniko in case of builds that produce a large number of small files on the filesystem in the intermediate stages. Multi-stage builds also seem to contribute to slow speed

swist avatar Apr 29 '20 11:04 swist

I expect the reason for this difference in speed is that "native" docker manages the layered filesystem using overlayfs (overlay2): so taking a snapshot is as simple as telling the FS driver to finish a layer. While kaniko doesn't natively track that on the filesystem, so it has to stop and stat everything in the filesystem in order to take a snapshot.

I'd be interested in whether this is a fundamental limitation of the kaniko design, or whether if you can have a user-mode file system driver or overlayfs running in the docker container running kaniko, you could obtain the matching speeds.

bsmedberg-xometry avatar Jun 09 '20 18:06 bsmedberg-xometry

@bsmedberg-xometry I love your explanation as I fully agree. I have just recently watched a very good talk about the "backend" of the Docker daemon in which a guy responsible for the file-system at Docker explains the differences. Whilst it sounds like possible to actually do what you have suggested, I think that it can't be achieved without changing the source-code of kaniko.

mayrbenjamin92 avatar Jun 22 '20 16:06 mayrbenjamin92

I understand the filesystem snapshotting issue is driven by not using overlayfs, but what would explain the inordinate time it takes kaniko to push a layer to the cache?

cmamigonian avatar Jan 26 '21 22:01 cmamigonian

We are also having this issue. Switching to Kaniko solved some other DIND issues we were having, but added 12+ minutes to our build times.

  • Gitlab SaaS (13.x)
  • Private integrated (EKS) Kubernetes cluster + runner
  • DID build time avg: ~4m
  • Kaniko build time avg: ~16m

tjtravelnet avatar Mar 19 '21 21:03 tjtravelnet

@tjtravelnet Did you use any of the new use-new-run flag? you can also use help us with some profiling data to understand where kaniko is spending time https://github.com/GoogleContainerTools/kaniko#kaniko-builds---profiling

tejal29 avatar Mar 31 '21 07:03 tejal29

Build times are insanely long compared to DIND even with caching activated.

Environment:

  • Jenkins
  • Azure Kubernetes Service.
  • Azure Container Registry.

Kyouuma avatar Apr 22 '21 13:04 Kyouuma

Same experience on my side with Kubernetes gitlab runners.

The build is a WAY longer than on my computer and I build on a pentium... Any improvments ?

acherifi avatar Apr 24 '21 08:04 acherifi

Has the similar issue, end up with add --snapshotMode=redo, turn all the verbose off, and filtering all the unnecessary file in .dockerignore. The result is acceptable now. From 46m to ~ 10m.

jerry153fish avatar Apr 30 '21 02:04 jerry153fish

We can observe this behavior, too - but from my point of view it's not a real problem here. Of course it would be nice if the snapshot taking could be tuned, but it will never reach the performance of an overlayfs based snapshot / layer creation. So for us the best solution which works is to perform all the build work outside kaniko (no multi staged builds), build stuff in an own Gitlab job k8s container and then just copy the assembled application - with only the needed files - to the image that has to be build with Kaniko. Then the performance impacts are no problem regarding the big security benefit we get when we don't rely on DIND (which sould be forbidden in CI/CD in times of supply chain attacks..).

ghost avatar Jul 13 '21 09:07 ghost

We are running the GitLab runner in AKS. Kaniko surpasses DiND for the same build job (to build docker images) with the below added flags:

--snapshotMode=redo
--use-new-run

With DiND it takes around 5,5mins and with Kaniko it comes down to 3,25mins.

bhordupur avatar Dec 10 '21 14:12 bhordupur

We have builds running in Kaniko that, due to the file system snapshots, are taking unacceptably long. This does not seem to have been remedied by using --use-new-run or --snapshotMode=redo individually, although using them together did substantially improve the build duration (still unacceptably long for this use-case, unfortunately). Just a +1 that this appears to remain an issue.

haljac avatar Mar 25 '22 03:03 haljac

Same here. I tried used Kaniko in Google Cloud Build to get better caching behavior, but it's so slow that it's not worth it. Using --use-new-run or --snapshotMode=redo does improve things a little, but using Docker is still much faster.

I've turned my attention to Docker Buildx instead as it seems to combine the best of both worlds: fast builds and reliable caching.

pdfrod avatar May 23 '22 16:05 pdfrod

I've turned my attention to Docker Buildx instead as it seems to combine the best of both worlds: fast builds and reliable caching.

Curious, are are you using Buildx with Cloud Build?

rushilsrivastava avatar Nov 19 '22 09:11 rushilsrivastava

Curious, are are you using Buildx with Cloud Build?

I tried to, but unfortunately my team is using GCP Container Registry and it doesn't seem to support Buildx cache artifacts.

Artifact Registry on the other hand seems to work fine with Buildx, but since it's a lot more expensive that Container Registry, I'm not sure if it's worth it for us.

pdfrod avatar Nov 20 '22 11:11 pdfrod

same problem, any progress? I realize that this question has been open for 4 years, is there any kaniko related benchmark?

salamer avatar Aug 07 '23 08:08 salamer

i have the same problem

0x217 avatar Aug 18 '23 16:08 0x217

Me too.

mdagost avatar Sep 11 '23 18:09 mdagost

We are running the GitLab runner in AKS. Kaniko surpasses DiND for the same build job (to build docker images) with the below added flags:

--snapshotMode=redo
--use-new-run

With DiND it takes around 5,5mins and with Kaniko it comes down to 3,25mins.

If you consider using those flags, please check the docs first and proceed with caution, as using those flag may cause errors for you.

At the time of writing: --use-new-run

[...] This new run mode trades off accuracy/correctness in some cases (potential for missed files in a "snapshot") for improved performance by avoiding the full filesystem snapshots.

--snapshotMode If it runs in mode other than full, it doesn't compare e.g. file contents

KamilKopaczyk avatar Nov 07 '23 12:11 KamilKopaczyk

running a Kaniko pod in a microk8s Kubernetes with setting hostNetwork: true increases the performance significantly. With that setup I reduced the time of an image creation from ~12 min to ~3 min

So there might be some firewall/network issue when host network is not exposed Of course, its not a recommended setting. But at lease I know a possible reason

ole1986 avatar Jan 12 '24 08:01 ole1986

Same thing here

amine-mokaddem avatar Apr 12 '24 22:04 amine-mokaddem

the same.

upd: with such flags, it works on the same level as docker for me. decreased from 45 minutes to 8 minutes for a fairly dense image

  stage: build
  rules:
    - !reference [.master_or_web__rules, rules]
  script:
    - >-
      /kaniko/executor
      --context $CI_PROJECT_DIR/image
      --dockerfile $CI_PROJECT_DIR/image/Dockerfile
      --destination ${CI_DOCKER_IMAGE}:${CI_COMMIT_SHORT_SHA}
      --destination ${CI_DOCKER_IMAGE}:latest
      --cache=false
      --cache-repo=${CI_DOCKER_IMAGE}:latest
      --cache-ttl=1h
      --force
      --cleanup
      --single-snapshot

akimrx avatar Apr 20 '24 11:04 akimrx