docker-rdkit icon indicating copy to clipboard operation
docker-rdkit copied to clipboard

ARM64 builds missing

Open nmunro opened this issue 10 months ago • 19 comments

In the last day or so the arm64 builds we use do not seem to be available anymore. We have had to adjust our docker-compose files to use the linux/amd64 arch as a temporary work around.

     postgres:
     image: informaticsmatters/rdkit-cartridge-debian:Release_2024_09_2
+    platform: linux/amd64
     environment:

Attempting to build the docker-rdkit image locally revealed this bit of information

> [build  2/14] RUN apt-get update &&  apt-get install -y --no-install-recommends   build-essential  python3-dev  python3-numpy  python3-pip  cmake  sqlite3  libsqlite3-dev  libboost1.83  libboost1.83-dev  libboost-system1.83  libboost-thread1.83  libboost-serialization1.83  libboost-python1.83  libboost-regex1.83  libboost-iostreams1.83  zlib1g-dev  swig  libeigen3-dev  git  wget  openjdk-17-jdk  postgresql-17  postgresql-server-dev-17  postgresql-plpython3-17  zip  unzip  libfreetype6-dev &&  apt-get clean -y:                                                                                                                                                      
0.215 Get:1 http://deb.debian.org/debian trixie InRelease [178 kB]                                                                                                     
0.245 Get:2 http://deb.debian.org/debian trixie-updates InRelease [49.6 kB]                                                                                            
0.295 Get:3 http://deb.debian.org/debian-security trixie-security InRelease [48.0 kB]                                                                                  
0.311 Get:4 http://deb.debian.org/debian trixie/main arm64 Packages [9614 kB]                                                                                          
2.067 Fetched 9890 kB in 2s (5206 kB/s)                                                                                                                                
2.067 Reading package lists...                                                                                                                                         
2.546 Reading package lists...                                                                                                                                         
2.999 Building dependency tree...
3.098 Reading state information...
3.371 E: Unable to locate package openjdk-17-jdk

Since the Dockerfile doesn't appear to have updated debian trixie, a quick look at the openjdk version for trixie (https://packages.debian.org/trixie/openjdk-21-jdk) shows that version 21 is the earliest version in trixie now. The JAVA_VER defined in the Dockerfile-debian appears to point to version 17. https://github.com/InformaticsMatters/docker-rdkit/blob/c3f19d7ca440dc544b84c9fb718c685247a7c470/Dockerfile-debian#L21

We have tested updating this version locally and it appears to work, but wanted to check if it was the "right" solution.

nmunro avatar May 22 '25 15:05 nmunro

Yes, Java 17 disappeared from the Debian repo about 2 days ago, and I’m in the process of flushing through the changes to update to Java 21, but not all done yet, so go ahead and run locally with that change. I’m also seeing other problems that I’m trying to sort out, in particular with the arm64 builds.

tdudgeon avatar May 22 '25 16:05 tdudgeon

The update to Java 21 has been done, but I'm unable to build arm64 images as the RDKit compilation crashes. @nmunro does this happen for you too?

tdudgeon avatar May 27 '25 10:05 tdudgeon

I just cloned and ran ./build-debian.sh and got this output, is this the crash you are experiencing?

1653.0 [ 23%] Linking CXX static library libCatch2.a
1660.2 [ 23%] Built target Catch2
1660.6 make: *** [Makefile:166: all] Error 2
------

 3 warnings found (use docker --debug to expand):
 - FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 9)
 - UndefinedVar: Usage of undefined variable '$LD_LIBRARY_PATH' (line 59)
 - UndefinedVar: Usage of undefined variable '$PYTHONPATH' (line 60)
Dockerfile-debian:82
--------------------
  81 |
  82 | >>> RUN nproc=$(getconf _NPROCESSORS_ONLN)\
  83 | >>>   && make -j $(( nproc > 2 ? nproc - 2 : 1 ))
  84 |     RUN make install
--------------------
ERROR: failed to solve: ResourceExhausted: process "/bin/sh -c nproc=$(getconf _NPROCESSORS_ONLN)  && make -j $(( nproc > 2 ? nproc - 2 : 1 ))" did not complete successfully: cannot allocate memory

nmunro avatar May 27 '25 14:05 nmunro

No, I'm seeing a different error, but it's not obvious what the problem is. I just see this reported

c++: internal compiler error: Segmentation fault signal terminated program cc1plus

And the amd64 builds are fine.

Full log is here: rdkit-arm64-log.txt

tdudgeon avatar May 27 '25 15:05 tdudgeon

So, I've had to reduce the number of parallel jobs during compiling, but it seems to be going ok so far... might have to check back tomorrow to confirm when it finishes, but I'll get back to you once I have some more information.

nmunro avatar May 27 '25 15:05 nmunro

It largely seemed to have worked, I have a tiny change for non-parallel builds

index 1fad5f8..5cbc995 100644
--- a/Dockerfile-debian
+++ b/Dockerfile-debian
@@ -6,7 +6,7 @@
 # This image contains the artifacts used by the later stages.
 # The image is NOT pushed, just the cached layers used.
 
-FROM debian:trixie as build
+FROM debian:trixie AS build
 LABEL maintainer="Tim Dudgeon<[email protected]>"
 
 ARG GIT_REPO=https://github.com/rdkit/rdkit.git
@@ -79,8 +79,9 @@ RUN cmake -Wno-dev\
   -DCPACK_PACKAGE_RELOCATABLE=OFF\
   ..
 
-RUN nproc=$(getconf _NPROCESSORS_ONLN)\
-  && make -j $(( nproc > 2 ? nproc - 2 : 1 ))
+# RUN nproc=$(getconf _NPROCESSORS_ONLN)\
+#   && make -j $(( nproc > 2 ? nproc - 2 : 1 ))
+RUN make -j1
 RUN make install
 RUN sh Code/PgSQL/rdkit/pgsql_install.sh
 RUN cpack -G DEB

Which gave me the following output

 => [linux/amd64 build 11/14] RUN sh Code/PgSQL/rdkit/pgsql_install.sh                                                                                                                                                                                                   0.3s
 => [linux/amd64 build 12/14] RUN cpack -G DEB                                                                                                                                                                                                                         123.8s
 => [linux/amd64 build 13/14] RUN cd /rdkit/Code/JavaWrappers/gmwrapper && tar cvfz javadoc.tgz doc                                                                                                                                                                      0.5s
 => [linux/amd64 build 14/14] WORKDIR /rdkit                                                                                                                                                                                                                             0.0s
 => [linux/amd64 python 3/6] COPY --from=build /rdkit/build/RDKit-*Linux-Runtime.deb /rdkit/build/RDKit-*Linux-Python.deb /tmp/                                                                                                                                          0.7s
 => [linux/amd64 python 4/6] RUN dpkg -i /tmp/RDKit-*.deb && rm -f /tmp/*.deb                                                                                                                                                                                            1.2s
 => [linux/amd64 python 5/6] RUN useradd -u 1000 -g 0 -m rdkit                                                                                                                                                                                                           0.1s
 => ERROR exporting to image                                                                                                                                                                                                                                            36.7s
 => => exporting layers                                                                                                                                                                                                                                                 35.3s
 => => exporting manifest sha256:13d054970fe5b428988cab9717bda15ee878de9520f1158a54e9b253b9bc49af                                                                                                                                                                        0.0s
 => => exporting config sha256:19124374218861fe0a1710e368c44be8b7107e54e1487f6a9cab6e78e88a86f8                                                                                                                                                                          0.0s
 => => exporting attestation manifest sha256:e65c4625189bc6a8efa2b8e32c59004f02b911206d02b1fbe8736f2bb2f2afe9                                                                                                                                                            0.0s
 => => exporting manifest sha256:15f9dbee749758fd56dd86da380ce29bd3872595a540612167b3918d038effdb                                                                                                                                                                        0.0s
 => => exporting config sha256:91d3ca195df74a67359852d05215eb2b7349bc9f4f084f124e092715cac27165                                                                                                                                                                          0.0s
 => => exporting attestation manifest sha256:69cc959e024ad890083b8d36e67f13ded79f1088d2c6a86400403ecb653aa49d                                                                                                                                                            0.0s
 => => exporting manifest list sha256:7f180c42b0d43ab57f721301afe84989513b627aaab62afa12981fc000314c5c                                                                                                                                                                   0.0s
 => => pushing layers                                                                                                                                                                                                                                                    1.3s
------
 > exporting to image:
------

 2 warnings found (use docker --debug to expand):
 - UndefinedVar: Usage of undefined variable '$PYTHONPATH' (line 60)
 - UndefinedVar: Usage of undefined variable '$LD_LIBRARY_PATH' (line 59)
ERROR: failed to solve: failed to push informaticsmatters/rdkit-python3-debian:latest: push access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed

View build details: docker-desktop://dashboard/build/itrax-builder/itrax-builder0/xpuslqvar1hzfzeqkafusc7dd
➜  docker-rdkit git:(master) ✗

nmunro avatar May 27 '25 19:05 nmunro

So the arm64 builds are still problematical for me. A day or so ago the apt-get upgrade bit was failing, but that's fixed itself now. So now I'm back the the compilation of the RDKit code segfaulting (but not for amd64 platform). @nmunro this isn't happening for you?

@nmunro your change to use only one core is not viable as the build becomes even slower. What we could do is allow the number of cores to be overridden using an environment variable. e.g. you could do a export DOCKER_CORES=1 before building. Would that work for you? BTW I typically build on a machine with 24 cores and 40GB RAM so I'm not resource limited.

tdudgeon avatar May 29 '25 15:05 tdudgeon

Yeah, that could work, I certainly can't build it as is, I don't have as much memory as you do, but a configurable setting would be a good compromise. If you're also having issues building it, I can certainly build it (albeit slower) and push to somewhere, if that helps us both?

nmunro avatar May 29 '25 18:05 nmunro

Would you like me to implement the configuration?

nmunro avatar Jun 02 '25 10:06 nmunro

I can do it. Do you need it just on master branch?

tdudgeon avatar Jun 02 '25 10:06 tdudgeon

Yeah, I think that's fine, so it will allow slower, but complete builds if resources are tight, but I supposed it still doesn't resolve your issue being unable to build the arm images. Unless that's solved now?

nmunro avatar Jun 02 '25 10:06 nmunro

It's done on master: https://github.com/InformaticsMatters/docker-rdkit/commit/8869049228f3e5818700cf2da684707d8764c059

The logic is now in the build-debian.sh script and the number or core is specified as a build arg. Change the number of cores by defining the DOCKER_N_CORES environment variable.

I'm still unable to build the arm64 images, but a colleague running on arm is able to build both architectures, so confirming what you are finding.

tdudgeon avatar Jun 02 '25 11:06 tdudgeon

Just to check, has your colleague pushed an arm64 image we can pull? Or is there still a blocker on building?

nmunro avatar Jun 09 '25 10:06 nmunro

We're not currently building ARM64 images. Is there a particular version you are wanting?

tdudgeon avatar Jun 09 '25 11:06 tdudgeon

Well, Release_2024_09_2 was working, then, just stopped, so we certainly need that, but we're investigating upgrading to the latest. We'd need images for two versions, at least, but ideally we'd like to be able to use any image.

nmunro avatar Jun 09 '25 13:06 nmunro

While we continue to search for a better long-term solution I have just built and pushed ARM64 container images with a custom arm64 tag for the Release_2024_09_2 release for the following: -

  • informaticsmatters/rdkit-cartridge-debian:Release_2024_09_2-arm64
  • informaticsmatters/rdkit-tomcat-debian:Release_2024_09_2-arm64
  • informaticsmatters/rdkit-java-debian:Release_2024_09_2-arm64
  • informaticsmatters/rdkit-python3-debian:Release_2024_09_2-arm64

alanbchristie avatar Jun 10 '25 09:06 alanbchristie

@nmunro I've switched to building on AWS (using arm64) so should hopefully be able to get the images built again. But it's a slow process. Hope to have the Release_2024_09_2 images available again sometime tomorrow if all goes to plan.

tdudgeon avatar Jun 10 '25 16:06 tdudgeon

That's great, really appreciate this, thanks! If there's any further way in which I can help to get this working on arm64, please do let me know, I know it's probably something of an edge case, but I'm happy to help.

nmunro avatar Jun 11 '25 10:06 nmunro

@nmunro the Release_2024_09_2 images have been rebuilt and should be present for both architectures. I'll rebuild a few more but I'm not going to want to build everything due to the AWS costs.

tdudgeon avatar Jun 11 '25 15:06 tdudgeon