container-images
container-images copied to clipboard
Draft: multi arch build for debezium (DBZ-5213)
This is my draft how to build multi arch (amd64 and arm64) builds for debezium (https://issues.redhat.com/browse/DBZ-5213).
Note that I do not have much experience with docker builds (esp. not with multi arch builds), so probably there are better or other ways to implement the build. But maybe my changes can be a starting point.
Changes on this branch:
- all images are built for amd64 and arm64
- all images hopefully tagged with the same semantics as before 🙏
- all images are pushed to docker hub and quay.io.
- That behaviour differs from the current version. I think with buildx you HAVE to push the image somewhere as long as you build for more than one architecture.
- The script could be changed, so that it would be possible to not push the image if only one architecture is specified (for local tests for example)
- I made the name of the "debezium" docker hub user configurable
- it defaults to
debezium
- I don't know if that is really neccessary, but I didn't found another way to actually test the push and pull of the images. So changing the name is for testing only. If there is a better way for testing, please let me know.
- it defaults to
- I made the name of the "second" registry (quay.io) configurable
- it defaults to
quay.io
- this change is also related to testing: while testing you can specify another second registry (for example a local docker registry)
- it defaults to
- the GH workflow actions install buildx
- for some reason the build on github still fails, maybe someone with buildx and/or GH actions know-how can help.
- all changes work on my local machine 🙄
@nilshartmann I will ask for a bit of a patience here, I will review this within next week or so (I'm currently on PTO).
@jpechane does it make sense to retrospectively change the docker files for previous releases? Although this is just parametrisation I wouldn't do that.
does it make sense to retrospectively change the docker files for previous releases?
Doing those changes for 1.9 and 2.0 definitely is enough. We don't "support" older versions anymore at this point.
Doing those changes for 1.9 and 2.0 definitely is enough. We don't "support" older versions anymore at this point.
What about duplicate the current scripts (build-multi-arch-debezium.sh
for ex) and let them run with parameterized Dockerfiles only for 1.9 and upcoming releases. The current scripts then can be limited to pre-1.9 releases with current, non parameterized Dockerfiles? So changes would affect only Dockerfiles 1.9+ (and new build scripts).
@nilshartmann The build-all script should likely be extended to allow specifying which components to build. Ideally it should be able to determine that on its own and skip those were ARM build is not possible (however this might not be feasible).
Currently I can see 2 reasons why the build in GH is failing
- The GH action may not setup a buildx builder (not sure about this one)
- It will for sure fail with example-mongo as mongo:3.2 doesn't have ARM manifest. The build ends with
=> ERROR [linux/arm64 internal] load metadata for docker.io/library/mongo:3.2
...
ERROR: failed to solve: mongo:3.2: no match for platform in manifest sha256:0463a91d8eff189747348c154507afc7aba045baa40e8d58d8a4c798e71001f3: not found
Further attempt to explicitly pull mongo on AMR leads to cleaner error message
$docker pull mongo:3.2
3.2: Pulling from library/mongo
no matching manifest for linux/arm64/v8 in the manifest list entries
@nilshartmann I might be missing few steps so if you could provide steps how to run the build (ideally with the use of registry:2
container as a registry, it would be greatly appreciated.
Hi @jcechace!
Thanks a lot for your feedback and comments!
The build-all script should likely be extended to allow specifying which components to build. Ideally it should be able to determine that on its own and skip those were ARM build is not possible (however this might not be feasible).
I will change the script. Probably first by manually setting a list of platforms for each image (or list of images that should not be built for ARM), as it seems to be much easier to implement. If anything else works, we can try to automatically determine for what images ARM should be skipped imho.
Currently I can see 2 reasons why the build in GH is failing
1. The GH action may not setup a buildx builder (not sure about this one)
I think buildx build is set up, as I tried with a very simple image and that worked. But I'll re-check.
2. It will for sure fail with example-mongo as mongo:3.2 doesn't have ARM manifest. The build ends with
Thanks! Good point!
Regarding your other questions. I introduced the env variables to be able to set different image names/registries during my local test builds:
-
DEBEZIUM_DOCKER_NAME
: I set this to my dockerhub username to test build and deploy to docker hub -
DEBEZIUM_QUAY_IO
: While tests I set this to a local docker registry
If there is a better way to run, test and push the build/images (locally), please let me know. I'm in no way a docker expert 🥺
hello @jcechace, a long time no see! :)
I just stumbled upon this as I'm using the debezium/connect-base
container image in my tests and I realized it's too slow because it's running emulated on M1 mac. It would be so cool to have multi-arch images.
many thanks to @nilshartmann for this work!
Hi @jcechace,
I pushed some new commits. Inside the repository there is a new README-build.md
file that describes the steps for building images locally for test.
The build-all
script now runs:
-
build-postgres-multiplatform.sh
andbuild-postgres-multiplatform.sh
for either withlinux/amd64
orlinux/amd64,linux/arm64
. I tested all versions on all platforms. If the arm build for one version failed, I deceided to stay withlinux/amd64
only. - Debezium Builds in
build-all
run the "old"build-debezium.sh
for all Debezium versions that are not explizitly listed inbuild-all.sh
(see there). Newer versions are built usingbuild-debezium-multiplatform.sh
forlinux/amd64,linux/arm64
. (btw I had trouble building some of the older versions of Debezium, eventhough I didn't change the script. Might be a problem on Mac M1.)
You can use both env variables DEBEZIUM_DOCKER_NAME
and DEBEZIUM_QUAY_IO
to change the registries the images are pushed into (please see the README_build.md
file for more information including setup of a local registry:2).
(Maybe DEBEZIUM_DOCKER_NAME
and DEBEZIUM_QUAY_IO
should be renamed to something like DEBEZIUM_DOCKER_REGISTRY_1_NAME
and DEBEZIUM_DOCKER_REGISTRY_2_NAME
)
BTW you can build the images and push them to a local registry inside a GitPod workspace if you like. Just open the branch in GitPod, then run on terminal:
docker-compose -f local-registry/docker-compose.yml up -d
./setup-local-builder.sh
export DEBEZIUM_DOCKER_NAME=localhost:5500/debezium
export DEBEZIUM_QUAY_IO=localhost:5500/debeziumquay
./build-postgres-multiplatform.sh 14-alpine "linux/amd64,linux/arm64"
./build-debezium-multiplatform.sh 1.9
Propably there is still some work to do, please let me know, if I can help.
Thanks, Nils
Doing those changes for 1.9 and 2.0 definitely is enough. We don't "support" older versions anymore at this point.
What about duplicate the current scripts (
build-multi-arch-debezium.sh
for ex) and let them run with parameterized Dockerfiles only for 1.9 and upcoming releases. The current scripts then can be limited to pre-1.9 releases with current, non parameterized Dockerfiles? So changes would affect only Dockerfiles 1.9+ (and new build scripts).
Sounds like a good solution to me -- we keep the ability to build the old (but unsupported) and also add the new options. It also seems to be in order with how we duplicate dockerfiles for new versions of DBZ.
@nilshartmann Regarding the naming such as DEBEZIUM_DOCKER_REGISTRY_1_NAME... I think this would be better. Maybe even DEBEZIUM_DOCKER_REGISTRY_PRIMARY_NAME.
I hit and error
ERROR: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/sh -c echo \"$SHA256HASH /tmp/zookeeper.tar.gz\" | sha512sum -c - && tar -xzf /tmp/zookeeper.tar.gz -C $ZK_HOME --strip-components 1 && rm -f /tmp/zookeeper.tar.gz" did not complete successfully: exit code: 2
This seems to be related to a required qemu interpreter.
Sources: https://www.linuxfixes.com/2021/12/solved-can-install-bash-in-multiarch.html https://github.com/docker/buildx/issues/464
multiarch/qemu-user-static didn't seem to work for me (ubuntu arm64 instance in EC2) while the tonistiigi/binfmt one worked. At the moment this is all a black magic for me, so I will poke around a bit before merging. So far the build is running (although It look like it is quite slow -- however it looks like the main bottleneck is downloading maven dependencies so probably unrelated to this PR).
Hi @jcechace,
Regarding the naming such as DEBEZIUM_DOCKER_REGISTRY_1_NAME... I think this would be better. Maybe even DEBEZIUM_DOCKER_REGISTRY_PRIMARY_NAME.
I changed the names to DEBEZIUM_DOCKER_REGISTRY_PRIMARY_NAME
and DEBEZIUM_DOCKER_REGISTRY_SECONDARY_NAME
although It look like it is quite slow
I noticed that too, not sure if this is somehow related to the emulator.
I wonder if it really makes sense to rebuild all images everytime (build-all.sh
).
At the moment this is all a black magic for me, so I will poke around a bit before merging.
Please let me know if I can help.
@jerrinot
although It look like it is quite slow
I noticed that too, not sure if this is somehow related to the emulator.
I wonder if it really makes sense to rebuild all images everytime (
build-all.sh
). Technically yes, since there is no need to run the script unless there were some changes. The majority of slowness (apart from UI) is PG builds
At the moment this is all a black magic for me, so I will poke around a bit before merging.
Please let me know if I can help.
I think I got a better grasp. Seems an alternative would be a cross build with arch specific builder stages, however that would require a modification of all dockerfiles so our current aproach seems more feasible for now.
@nilshartmann It looks like PG 14 does not build via buildx on amd64 (it build fine using standard docker build). Another thing which seems to be failing consistently is the UI -- however I think that might be a known issue (the log output for buildx is not as informative as a regular build so I will look more into this).
@jcechace I tried building PG 14 both on GitPod and Mac M1 using ./build-postgres-multiplatform.sh 14 linux/amd64
and it worked successfuly. Only building for linux64/arm
failed.
@nilshartmann can you make sure that the buildx cache is pruned? Strangely enough when I built PG 14 using the old build-postgres.sh
script with regular docker build, then consequent run of the multiplatform script passed. Otherwise I'm gettting
#0 49.03 make: *** [/usr/lib/postgresql/14/lib/pgxs/src/makefiles/../../src/Makefile.shlib:293: decoderbufs.so] Segmentation fault
This happens on EC2 arm64 VM running Ubuntu.
@jcechace I did docker buildx prune --all
(Mac M1) and the build still works. I also setup a test build on Github (ubuntu) and it works: https://github.com/nilshartmann/pg-buildx/actions/runs/3135577830/jobs/5091422539
One remark, can we have the x86 build only by default and multiarch executed only upon an option/env var setting?
@jpechane So when running build-all.sh
the "old" build-debezium
, build-postgres
and build-mongo
scripts should be run? Or would it be sufficient to run be default the new scripts but set the platform to linux/amd64
only? In any case I think it would be easier to provide to scripts build-all.sh
and build-all-multiplatform.sh
.
@nilshartmann That's an implementation detail :-). Yes, setting platform to linux/amd64
is enough. And I agredd with having two scripts, build-all.sh
woud produce the same output as now and build-all-multiplatform.sh
will do evertyhing.
Great idea, I think this will provide the largest amount of backward compatibility.
Regarding the build of PG 14... it is strange, at this point I have about 60% success rate on building that particular image with buildx on arm64 EC2 machine. I wonder if something is different on Apple silicon. To make things weirder, I can see two errors consistently...
The one mentioned above and
#0 267.1 Setting up software-properties-common (0.96.20.2-2.1) ...
#0 270.4 Traceback (most recent call last):
#0 270.4 File "/usr/bin/py3compile", line 319, in <module>
#0 270.4 main()
#0 270.4 File "/usr/bin/py3compile", line 298, in main
#0 270.4 compile(files, versions,
#0 270.4 File "/usr/bin/py3compile", line 185, in compile
#0 270.4 cfn = interpreter.cache_file(fn, version)
#0 270.4 File "/usr/share/python3/debpython/interpreter.py", line 212, in cache_file
#0 270.4 (fname[:-3], self.magic_tag(version), last_char))
#0 270.4 File "/usr/share/python3/debpython/interpreter.py", line 246, in magic_tag
#0 270.5 return self._execute('import imp; print(imp.get_tag())', version)
#0 270.5 File "/usr/share/python3/debpython/interpreter.py", line 359, in _execute
#0 270.5 raise Exception('{} failed with status code {}'.format(command, output['returncode']))
#0 270.5 Exception: python3.9 -c 'import imp; print(imp.get_tag())' failed with status code 139
#0 270.6 dpkg: error processing package software-properties-common (--configure):
#0 270.6 installed software-properties-common package post-installation script subprocess returned error exit status 1
Nevertheless, I am also going to try it the other way around and build the packages on amd64. The issue might be with qemu interpreters on arm64 ubuntu.
@jpechane I will try to implement this later today
PG14 (non alpine) builds just fine for both amd64 and arm64 on amd64 ubuntu machine in EC2. I will do one final pass of build-all for both platforms on amd64 host and once @nilshartmann implements the changes @jpechane suggest we should be good to merge this. Another observation... arm64 emulation on x86 is orders of magnitude faster than the other way around (at least in my environment, however considering the instruction complexity of each architecture it makes sense).
@nilshartmann please also correct the failing commit message.
With latest (force) push, I changed the scripts as discussed:
-
build-*.sh
are the "old" scripts, mostly unchanged (see below) -
build-*-multiplatform.sh
contain the new scripts
There is one exception: build-debezium.sh
also sets the DEBEZIUM_DOCKER_REGISTRY_PRIMARY_NAME
and DEBEZIUM_DOCKER_REGISTRY_SECONDARY_NAME
env because those are referenced in
the 1.9 and 2.0 Dockerfiles (for kafka, connect etc.).
If you run build-debezium.sh
for an older version than 1.9 nothing should change
comparing to the current scripts. Running build-debezium.sh
for 1.9+ also should
not change something, but (base) image names in Dockerfiles are taken from environment
variables that are either set on commandline or set to default values inside build-debezium.sh
I also changed build-debezium-multiplatform.sh
to align it with the other
build-*-multiplatform
scripts. It now takes the platforms to built as second
argument. So you could limit the build for example to only linux/amd64
.
Then there is build-tools.sh
and build-tools-multiplatform.sh
. Not sure if multiplatform support
is needed here, if not, we could remove build-tools-multiplatform.sh
otherweise add it to GH actions.
@jcechace
ERROR: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/sh -c echo \"$SHA256HASH /tmp/zookeeper.tar.gz\" | sha512sum -c - && tar -xzf /tmp/zookeeper.tar.gz -C $ZK_HOME --strip-components 1 && rm -f /tmp/zookeeper.tar.gz" did not complete successfully: exit code: 2
multiarch/qemu-user-static didn't seem to work for me (ubuntu arm64 instance in EC2) while the tonistiigi/binfmt one worked. At the moment this is all a black magic for me, so I will poke around a bit before merging. So far the build is running (although It look like it is quite slow -- however it looks like the main bottleneck is downloading maven dependencies so probably unrelated to this PR).
There is another GH action (docker/setup-qemu-action@v2
) that sets up qemu and I added that to my test build, and it seems to work. See here an example (zookeeper only, but I'm optimistic, that kafka also builds): https://github.com/nilshartmann/docker-arch-build-test/actions/runs/3138713626/jobs/5098350092
Another option would be to replace tar -xzv
with gunzip tarfile.tar.gz && tar -xf tarfile.tar
. I tried that for zookeeper also and it works (https://github.com/nilshartmann/docker-arch-build-test/actions/runs/3138581786/jobs/5098069252). We would have to change the Dockerfiles for apache and zookeeper (1.9+) and connect (snapshot).
Advantage of changing the docker file over the qemu is imho, that anyone can simply build the images (locally) without having to install qemu.
Update
Without QEMU docker-maven-download
in connect-base
does not build:
https://github.com/nilshartmann/docker-arch-build-test/actions/runs/3138809173/jobs/5098552113#step:7:1587,
but with QEMU it does: https://github.com/nilshartmann/docker-arch-build-test/actions/runs/3138917357/jobs/5098778184#step:6:1379. So I added docker/setup-qemu-action@v2
to GH action files.
Finally I was able to build all 1.9 and 2.0 images (with the exception of debezium-ui 2.0) 😊
Downside: it took more than four hours 😱 https://github.com/nilshartmann/docker-arch-build-test/actions/runs/3138917357/jobs/5098778184
I haven't had a deeper look into it yet, but building the tools
images took more than three hours. If anyone has an idea how to speed that up, please let me know. Otherwise I would suggest that we stay with the old build here and do not provide multiplatform images.
@nilshartmann nice work, everything looks in order -- I was able to successfully build all the images as well (doing so on x86 is actually much faster (about 3 times) as emulating arm on x86 is faster). I will do some testing with produced images and merge the PR afterwards.
I fixed the shellcheck error (https://github.com/debezium/container-images/actions/runs/3141452521/jobs/5128785789#step:3:7) with my latest push
@jpechane LGTM, one this is that the GH Action for anonymous build will likely take quite a lot of time. Do we perhaps want to make it optional or something?
Otherwise once the action is finished this can be merge.