wave
wave copied to clipboard
Replace kaniko with buildkit
This PR will replace the builder image with buildkit
build process is getting stuck at
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=debug msg="auto snapshotter: overlayfs is not available for /var/lib/buildkit, trying fuse-overlayfs: failed to mount overlay: operation not permitted"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=debug msg="auto snapshotter: fuse-overlayfs is not available for /var/lib/buildkit, falling back to native: fuse-overlayfs not installed: exec: \"fuse-overlayfs\": executable file not found in $PATH"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=info msg="auto snapshotter: using native"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=warning msg="using host network as the default"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=warning msg="failed to prepare cgroup controllers: mkdir /sys/fs/cgroup/init: read-only file system"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=info msg="found worker \"1txxm5558fckd9ky82z6cfz1z\", labels=map[org.mobyproject.buildkit.worker.executor:oci org.mobyproject.buildkit.worker.hostname:fbef5e0b3043 org.mobyproject.buildkit.worker.network:host org.mobyproject.buildkit.worker.oci.process-mode:sandbox org.mobyproject.buildkit.worker.selinux.enabled:false org.mobyproject.buildkit.worker.snapshotter:native], platforms=[linux/arm64 linux/amd64 linux/amd64/v2 linux/riscv64 linux/ppc64le linux/s390x linux/386 linux/mips64le linux/mips64 linux/arm/v7 linux/arm/v6]"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=warning msg="skipping containerd worker, as \"/run/containerd/containerd.sock\" does not exist"
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=info msg="found 1 workers, default=\"1txxm5558fckd9ky82z6cfz1z\""
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=warning msg="currently, only the default worker can be used."
2024-05-16 17:18:32 time="2024-05-16T15:18:32Z" level=info msg="running server on /run/buildkit/buildkitd.sock"
Working on fixing it
Oh, it was fast
@arnaualcazar as you already tried buildkit, plase take a look on this PR and fi you get some idea about why its getting stuck, pleas share
Have you tried running it just using the docker run cli?
I tested the buildkit with service deployment on a kubernetes cluster: https://github.com/moby/buildkit/blob/master/examples/kubernetes/README.md#deployment--service After deploying it, you can configure docker build to connect to the buildkit pod to start the build.
Likely we'll go with rootless job approach
Have you tried running it just using the
docker runcli?
yes, in local i tried with both k8s and docker
with docker run abd debug enabled
/bin/zsh /Users/munish.chouhan/main_ground/wave/master/wave/build-workspace/4170bcb9964acc85_1/docker.sh
munish.chouhan@Munishs-MacBook-Pro 4170bcb9964acc85_1 % /bin/zsh /Users/munish.chouhan/main_ground/wave/master/wave/build-workspace/4170bcb9964acc85_1/docker.sh
time="2024-05-16T18:13:35Z" level=info msg="auto snapshotter: using overlayfs"
time="2024-05-16T18:13:35Z" level=debug msg="running in rootless mode"
time="2024-05-16T18:13:35Z" level=info msg="found worker \"2y6v6ew7j1d5252qdtesy5io5\", labels=map[org.mobyproject.buildkit.worker.executor:oci org.mobyproject.buildkit.worker.hostname:6caea173439a org.mobyproject.buildkit.worker.network:host org.mobyproject.buildkit.worker.oci.process-mode:sandbox org.mobyproject.buildkit.worker.selinux.enabled:false org.mobyproject.buildkit.worker.snapshotter:overlayfs], platforms=[linux/arm64 linux/amd64 linux/amd64/v2 linux/riscv64 linux/ppc64le linux/s390x linux/386 linux/mips64le linux/mips64 linux/arm/v7 linux/arm/v6]"
time="2024-05-16T18:13:35Z" level=warning msg="skipping containerd worker, as \"/run/containerd/containerd.sock\" does not exist"
time="2024-05-16T18:13:35Z" level=info msg="found 1 workers, default=\"2y6v6ew7j1d5252qdtesy5io5\""
time="2024-05-16T18:13:35Z" level=warning msg="currently, only the default worker can be used."
time="2024-05-16T18:13:35Z" level=info msg="running server on /run/user/1000/buildkit/buildkitd.sock"
time="2024-05-16T18:13:36Z" level=debug msg="remove snapshot" key=y3arilf1bnoe50kqfcyuswcvu snapshotter=overlayfs
time="2024-05-16T18:13:36Z" level=debug msg="schedule snapshotter cleanup" snapshotter=overlayfs
time="2024-05-16T18:13:36Z" level=debug msg="removed snapshot" key=buildkit/1/y3arilf1bnoe50kqfcyuswcvu snapshotter=overlayfs
time="2024-05-16T18:13:36Z" level=debug msg="snapshot garbage collected" d=4.951125ms snapshotter=overlayfs
when i run the same command inside container it works:
finally wokring with docker, now will make it work with k8s
Now, its working with k8s, but caching is not working in both cases. I will wokr on that now
I am checking with buildkit in slack about how to use repository as cache in it
cache is working in k8s setup, I will work on making it work on docker
@pditommaso there are two code snippets, that are written to fix kaniko bugs, what should we do with them?
https://github.com/seqeralabs/wave/blob/3fbbc7fee0e34cfaaa8c464572ffbe690d1a177f/src/main/groovy/io/seqera/wave/auth/RegistryInfo.groovy#L46-L50
https://github.com/seqeralabs/wave/blob/3fbbc7fee0e34cfaaa8c464572ffbe690d1a177f/src/main/groovy/io/seqera/wave/core/ContainerPlatform.groovy#L122-L124
Do that break Buildkit?
Do that break Buildkit?
no
Then let keep it
tested on dev:
This PR depends upone this PR to merge first https://github.com/seqeralabs/platform-deployment/pull/363
Build is working in dev, but cache is not working I will dig into it to fix it
What caching is not working ?
What caching is not working ?
nothing is uploaded to cache repository
in local testing, cache was getting exported to registry:
I see. Seems working similar to Kaniko
I see. Seems working similar to Kaniko
not exactly same, here we have to provide tag with the registry, so there will only be one image in the cache repository containing layers
found error:
#12 writing cache manifest sha256:52f6bf1b49ac1d9a263cd97a071312dbc8b056d4b21e57a2c8577800bf3891af 0.2s done
#12 ERROR: error writing manifest blob: failed commit on ref "sha256:52f6bf1b49ac1d9a263cd97a071312dbc8b056d4b21e57a2c8577800bf3891af": unexpected status from PUT request to https://<account>.dkr.ecr.eu-west-2.amazonaws.com/v2/wave/build/cache/manifests/cache: 400 Bad Request
now everything is working, but the caching depends upon tag, and when we use the same tag for multiple builds, it overrides the previous one, so we need to figure out a tag which is can identify the layers for a particular built, so that it can be imported back, when the same kind of build request comes.
@pditommaso caching is buildkit is about caching the specific image and use the same when future build of same image comes in, its not caching layers like kaniko, where it can be used for multiple builds https://www.reddit.com/r/docker/comments/13h567h/new_idea_maybe_tag_buildkit_cache_images_with/
I have added containerId as a tag for cache to import the correct image for any future build
Not sure to understand how tag is used here. Can you make an example?
Not sure to understand how tag is used here. Can you make an example?
Incase of kaniko caching is done like --cache-repo <RESPOSITORY>
but in case of buildkit two flags are used
- export cache
--export-cache type=registry,ref=<RESPOSITORY>:<TAG> - import cache
--import-cache type=registry,ref=<RESPOSITORY>:<TAG>
here is an example: if i request multiqc, now i am using conatinerId as tag
% wave --conda-package multiqc --wave-endpoint http://localhost:9090 --platform linux/arm64
db74a1b3f178.ngrok.app/wt/0ffd5c1e995a/wave/build/dev:multiqc--5651783df34afd2e
and image will be pushed to a build repository and also to the cache, so if in the future, the same image is requested, it will check the cache for that tag mentioned in --import-cache and get the image
build repository:
cache repository