wave icon indicating copy to clipboard operation
wave copied to clipboard

Replace kaniko with buildkit

Open munishchouhan opened this issue 1 year ago • 11 comments

This PR will replace the builder image with buildkit

munishchouhan avatar May 16 '24 15:05 munishchouhan

build process is getting stuck at

2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=debug msg="auto snapshotter: overlayfs is not available for /var/lib/buildkit, trying fuse-overlayfs: failed to mount overlay: operation not permitted"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=debug msg="auto snapshotter: fuse-overlayfs is not available for /var/lib/buildkit, falling back to native: fuse-overlayfs not installed: exec: \"fuse-overlayfs\": executable file not found in $PATH"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=info msg="auto snapshotter: using native"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=warning msg="using host network as the default"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=warning msg="failed to prepare cgroup controllers: mkdir /sys/fs/cgroup/init: read-only file system"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=info msg="found worker \"1txxm5558fckd9ky82z6cfz1z\", labels=map[org.mobyproject.buildkit.worker.executor:oci org.mobyproject.buildkit.worker.hostname:fbef5e0b3043 org.mobyproject.buildkit.worker.network:host org.mobyproject.buildkit.worker.oci.process-mode:sandbox org.mobyproject.buildkit.worker.selinux.enabled:false org.mobyproject.buildkit.worker.snapshotter:native], platforms=[linux/arm64 linux/amd64 linux/amd64/v2 linux/riscv64 linux/ppc64le linux/s390x linux/386 linux/mips64le linux/mips64 linux/arm/v7 linux/arm/v6]"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=warning msg="skipping containerd worker, as \"/run/containerd/containerd.sock\" does not exist"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=info msg="found 1 workers, default=\"1txxm5558fckd9ky82z6cfz1z\""
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=warning msg="currently, only the default worker can be used."
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=info msg="running server on /run/buildkit/buildkitd.sock"

Working on fixing it

munishchouhan avatar May 16 '24 15:05 munishchouhan

Oh, it was fast

pditommaso avatar May 16 '24 15:05 pditommaso

@arnaualcazar as you already tried buildkit, plase take a look on this PR and fi you get some idea about why its getting stuck, pleas share

munishchouhan avatar May 16 '24 15:05 munishchouhan

Have you tried running it just using the docker run cli?

pditommaso avatar May 16 '24 16:05 pditommaso

I tested the buildkit with service deployment on a kubernetes cluster: https://github.com/moby/buildkit/blob/master/examples/kubernetes/README.md#deployment--service After deploying it, you can configure docker build to connect to the buildkit pod to start the build.

arnaualcazar avatar May 16 '24 16:05 arnaualcazar

Likely we'll go with rootless job approach

pditommaso avatar May 16 '24 16:05 pditommaso

Have you tried running it just using the docker run cli?

yes, in local i tried with both k8s and docker

munishchouhan avatar May 16 '24 18:05 munishchouhan

with docker run abd debug enabled


/bin/zsh /Users/munish.chouhan/main_ground/wave/master/wave/build-workspace/4170bcb9964acc85_1/docker.sh
munish.chouhan@Munishs-MacBook-Pro 4170bcb9964acc85_1 % /bin/zsh /Users/munish.chouhan/main_ground/wave/master/wave/build-workspace/4170bcb9964acc85_1/docker.sh
time="2024-05-16T18:13:35Z" level=info msg="auto snapshotter: using overlayfs"
time="2024-05-16T18:13:35Z" level=debug msg="running in rootless mode"
time="2024-05-16T18:13:35Z" level=info msg="found worker \"2y6v6ew7j1d5252qdtesy5io5\", labels=map[org.mobyproject.buildkit.worker.executor:oci org.mobyproject.buildkit.worker.hostname:6caea173439a org.mobyproject.buildkit.worker.network:host org.mobyproject.buildkit.worker.oci.process-mode:sandbox org.mobyproject.buildkit.worker.selinux.enabled:false org.mobyproject.buildkit.worker.snapshotter:overlayfs], platforms=[linux/arm64 linux/amd64 linux/amd64/v2 linux/riscv64 linux/ppc64le linux/s390x linux/386 linux/mips64le linux/mips64 linux/arm/v7 linux/arm/v6]"
time="2024-05-16T18:13:35Z" level=warning msg="skipping containerd worker, as \"/run/containerd/containerd.sock\" does not exist"
time="2024-05-16T18:13:35Z" level=info msg="found 1 workers, default=\"2y6v6ew7j1d5252qdtesy5io5\""
time="2024-05-16T18:13:35Z" level=warning msg="currently, only the default worker can be used."
time="2024-05-16T18:13:35Z" level=info msg="running server on /run/user/1000/buildkit/buildkitd.sock"
time="2024-05-16T18:13:36Z" level=debug msg="remove snapshot" key=y3arilf1bnoe50kqfcyuswcvu snapshotter=overlayfs
time="2024-05-16T18:13:36Z" level=debug msg="schedule snapshotter cleanup" snapshotter=overlayfs
time="2024-05-16T18:13:36Z" level=debug msg="removed snapshot" key=buildkit/1/y3arilf1bnoe50kqfcyuswcvu snapshotter=overlayfs
time="2024-05-16T18:13:36Z" level=debug msg="snapshot garbage collected" d=4.951125ms snapshotter=overlayfs

munishchouhan avatar May 16 '24 18:05 munishchouhan

when i run the same command inside container it works: Screenshot 2024-05-16 at 22 55 20

munishchouhan avatar May 16 '24 20:05 munishchouhan

finally wokring with docker, now will make it work with k8s

munishchouhan avatar May 16 '24 21:05 munishchouhan

Now, its working with k8s, but caching is not working in both cases. I will wokr on that now

munishchouhan avatar May 17 '24 14:05 munishchouhan

I am checking with buildkit in slack about how to use repository as cache in it

munishchouhan avatar May 22 '24 14:05 munishchouhan

cache is working in k8s setup, I will work on making it work on docker

munishchouhan avatar May 23 '24 08:05 munishchouhan

@pditommaso there are two code snippets, that are written to fix kaniko bugs, what should we do with them?

https://github.com/seqeralabs/wave/blob/3fbbc7fee0e34cfaaa8c464572ffbe690d1a177f/src/main/groovy/io/seqera/wave/auth/RegistryInfo.groovy#L46-L50

https://github.com/seqeralabs/wave/blob/3fbbc7fee0e34cfaaa8c464572ffbe690d1a177f/src/main/groovy/io/seqera/wave/core/ContainerPlatform.groovy#L122-L124

munishchouhan avatar May 23 '24 15:05 munishchouhan

Do that break Buildkit?

pditommaso avatar May 23 '24 15:05 pditommaso

Do that break Buildkit?

no

munishchouhan avatar May 23 '24 15:05 munishchouhan

Then let keep it

pditommaso avatar May 23 '24 15:05 pditommaso

tested on dev: Screenshot 2024-05-23 at 19 40 55

munishchouhan avatar May 23 '24 17:05 munishchouhan

This PR depends upone this PR to merge first https://github.com/seqeralabs/platform-deployment/pull/363

munishchouhan avatar May 24 '24 11:05 munishchouhan

Build is working in dev, but cache is not working I will dig into it to fix it

munishchouhan avatar May 24 '24 11:05 munishchouhan

What caching is not working ?

pditommaso avatar May 24 '24 11:05 pditommaso

What caching is not working ?

nothing is uploaded to cache repository Screenshot 2024-05-24 at 13 34 06

in local testing, cache was getting exported to registry: Screenshot 2024-05-24 at 13 35 11

munishchouhan avatar May 24 '24 11:05 munishchouhan

I see. Seems working similar to Kaniko

pditommaso avatar May 24 '24 11:05 pditommaso

I see. Seems working similar to Kaniko

not exactly same, here we have to provide tag with the registry, so there will only be one image in the cache repository containing layers

munishchouhan avatar May 24 '24 11:05 munishchouhan

found error:

#12 writing cache manifest sha256:52f6bf1b49ac1d9a263cd97a071312dbc8b056d4b21e57a2c8577800bf3891af 0.2s done

#12 ERROR: error writing manifest blob: failed commit on ref "sha256:52f6bf1b49ac1d9a263cd97a071312dbc8b056d4b21e57a2c8577800bf3891af": unexpected status from PUT request to https://<account>.dkr.ecr.eu-west-2.amazonaws.com/v2/wave/build/cache/manifests/cache: 400 Bad Request

munishchouhan avatar May 24 '24 12:05 munishchouhan

now everything is working, but the caching depends upon tag, and when we use the same tag for multiple builds, it overrides the previous one, so we need to figure out a tag which is can identify the layers for a particular built, so that it can be imported back, when the same kind of build request comes.

Screenshot 2024-05-24 at 18 02 43

munishchouhan avatar May 24 '24 16:05 munishchouhan

@pditommaso caching is buildkit is about caching the specific image and use the same when future build of same image comes in, its not caching layers like kaniko, where it can be used for multiple builds https://www.reddit.com/r/docker/comments/13h567h/new_idea_maybe_tag_buildkit_cache_images_with/

munishchouhan avatar May 24 '24 16:05 munishchouhan

I have added containerId as a tag for cache to import the correct image for any future build

munishchouhan avatar May 24 '24 17:05 munishchouhan

Not sure to understand how tag is used here. Can you make an example?

pditommaso avatar May 24 '24 17:05 pditommaso

Not sure to understand how tag is used here. Can you make an example?

Incase of kaniko caching is done like --cache-repo <RESPOSITORY>

but in case of buildkit two flags are used

  1. export cache --export-cache type=registry,ref=<RESPOSITORY>:<TAG>
  2. import cache --import-cache type=registry,ref=<RESPOSITORY>:<TAG>

here is an example: if i request multiqc, now i am using conatinerId as tag

% wave --conda-package multiqc  --wave-endpoint http://localhost:9090 --platform linux/arm64
db74a1b3f178.ngrok.app/wt/0ffd5c1e995a/wave/build/dev:multiqc--5651783df34afd2e

and image will be pushed to a build repository and also to the cache, so if in the future, the same image is requested, it will check the cache for that tag mentioned in --import-cache and get the image

build repository: Screenshot 2024-05-24 at 20 05 47

cache repository Screenshot 2024-05-24 at 20 06 04

munishchouhan avatar May 24 '24 18:05 munishchouhan