microservices-demo icon indicating copy to clipboard operation
microservices-demo copied to clipboard

`arm64`

Open mathieu-benoit opened this issue 1 year ago • 1 comments

arm64

mathieu-benoit avatar Jun 15 '24 18:06 mathieu-benoit

Hi @mathieu-benoit ! What do you think is the effort needed to make this work?

From comments here it looks like there's still a ton of want for this feature: https://github.com/GoogleCloudPlatform/microservices-demo/issues/622

bourgeoisor avatar Sep 10 '24 21:09 bourgeoisor

Hi @mathieu-benoit ! What do you think is the effort needed to make this work?

From comments here it looks like there's still a ton of want for this feature: #622

@bourgeoisor and @NimJay, please could you approve the CI tests on this PR? I'd like to see what my latest changes will produce. Thanks!

mathieu-benoit avatar Dec 30 '24 20:12 mathieu-benoit

@mathieu-benoit tests are passing. Thank you for your patience; getting back from the holidays! Feel free to DM me if I don't respond within a few days.

bourgeoisor avatar Jan 14 '25 00:01 bourgeoisor

@bourgeoisor, thanks!

I think this is now ready for your review.

Again, it's not publishing the arm64 images, but at least this PR is supposed to make the containers able to run locally on an arm64 platform. If someone can test this locally on MacOS, I think that would be very beneficial.

mathieu-benoit avatar Jan 14 '25 00:01 mathieu-benoit

@mathieu-benoit , I've tested the branch with Apple M1. The branch seems to fix the build issues. skaffold run executes, and the demo app seems to be running with OrbStack k8s:

microservices-demo frontend page

The resulting images seem to be ARM64 as well.

vlsi avatar Jan 15 '25 12:01 vlsi

Thanks @vlsi, for testing and following up, that's good news!

mathieu-benoit avatar Jan 15 '25 13:01 mathieu-benoit

Hi @mathieu-benoit! I just tested this by skaffold dev from my ARM MacBook, deploying to C4A nodes on GKE (ARM-based), and it looked like the two Python services fail with a "wrong architecture"-like error. Any ideas what could've gone wrong? The Google Cloud project did not have any prior Online Boutique images.

~/w/microservices-demo-mathieu (arm64✔=) k get pods
NAME                                     READY   STATUS             RESTARTS        AGE
adservice-55d4f88848-z622l               1/1     Running            0               14m
cartservice-54d4fb9d68-hmdw5             1/1     Running            0               14m
checkoutservice-76bd6c7dd8-z76n2         1/1     Running            0               14m
currencyservice-5dffc78857-n8thw         1/1     Running            0               14m
emailservice-7d4576ccd5-jx2xf            0/1     CrashLoopBackOff   7 (3m21s ago)   14m
frontend-5b7fdc5b7d-4g5jh                1/1     Running            0               14m
paymentservice-cb89cfbbc-tdtw2           1/1     Running            0               14m
productcatalogservice-74899b6887-p4ptk   1/1     Running            0               14m
recommendationservice-58d97df5b8-fr54s   0/1     CrashLoopBackOff   7 (3m22s ago)   14m
redis-cart-7756c55f85-8242m              1/1     Running            0               14m
shippingservice-7dd5b87767-9dmhz         1/1     Running            0               14m

~/w/microservices-demo-mathieu (arm64✔=) k logs emailservice-7d4576ccd5-jx2xf
exec /usr/local/bin/python: exec format error

~/w/microservices-demo-mathieu (arm64✔=) k logs recommendationservice-58d97df5b8-fr54s
exec /usr/local/bin/python: exec format error

bourgeoisor avatar Jan 15 '25 19:01 bourgeoisor

@vlsi, guessing that's the same for you where recommendationservice and emailservice are in same CrashLoopBackOff error?

@bourgeoisor, could you please deploy the loadgenerator too in order to see if it's having the same error? It's another python app but without any apk in the Dockerfile, so trying to narrow down and see if this can come from this.

mathieu-benoit avatar Jan 15 '25 20:01 mathieu-benoit

Looks like the services are fine for me:

NAME                                     READY   STATUS    RESTARTS        AGE
adservice-5b779c9565-jrd7f               1/1     Running   0               8h
cartservice-6cb7856b7f-6b2hv             1/1     Running   0               8h
checkoutservice-55bfc5bcfb-6nqb5         1/1     Running   0               8h
currencyservice-5d5b997c67-g25jd         1/1     Running   2 (52m ago)     8h
emailservice-7b94467544-kj5n8            1/1     Running   0               8h
frontend-ffb8f574-dsps8                  1/1     Running   0               8h
loadgenerator-6c95876f4d-jwjk6           1/1     Running   0               8h
paymentservice-5cb7f77ff7-fkl88          1/1     Running   1 (4h22m ago)   8h
productcatalogservice-74cdb9fb45-cvz8n   1/1     Running   0               8h
recommendationservice-6d9874d78-sdvsf    1/1     Running   0               8h
redis-cart-59cd576876-n7hqk              1/1     Running   0               8h
shippingservice-74f86dd47f-xrs8g         1/1     Running   0               8h

Here's the output from the loadgenerator pod:

Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
GET      /                                                                               3161     6(0.19%) |     29       0    2410     25 |    0.00        0.00
GET      /cart                                                                           9221     9(0.10%) |     30       1   11247     23 |    0.20        0.00
POST     /cart                                                                           9099     3(0.03%) |     18       2   11826     13 |    0.30        0.00
POST     /cart/checkout                                                                  2992    16(0.53%) |     17       3    2504     13 |    0.20        0.00
GET      /product/0PUK6V6EV0                                                             4358     4(0.09%) |     27       4    7486     22 |    0.00        0.00
GET      /product/1YMWWN1N4O                                                             4439     3(0.07%) |     24       4    5573     22 |    0.10        0.00
GET      /product/2ZYFJ3GM2N                                                             4534     2(0.04%) |     32       4    8596     22 |    0.10        0.00
GET      /product/66VCHSJNUP                                                             4473     6(0.13%) |     27       1    5691     21 |    0.40        0.00
GET      /product/6E92ZMYYFZ                                                             4448     3(0.07%) |     29       4    8731     22 |    0.10        0.00
GET      /product/9SIQT8TOJO                                                             4386     1(0.02%) |     28       4    7325     22 |    0.50        0.00
GET      /product/L9ECAV7KIM                                                             4419     2(0.05%) |     25       1    6295     22 |    0.30        0.00
GET      /product/LS4PSXUNUM                                                             4427     3(0.07%) |     24       5    4947     21 |    0.20        0.00
GET      /product/OLJCESPC7Z                                                             4275     3(0.07%) |     29       2    9290     22 |    0.10        0.00
POST     /setCurrency                                                                    6178     6(0.10%) |     41       3   20729     26 |    0.20        0.00
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
         Aggregated                                                                     70410    67(0.10%) |     27       0   20729     21 |    2.70        0.00

Previously, I was trying to adjust SHAs since they referred AMD64 ids.

For instance (see https://hub.docker.com/layers/library/python/3.12.7-alpine/images/sha256-b83d5ec7274bee17d2f4bd0bfbb082f156241e4513f0a37c70500e1763b1d90d). The images are multi-arch ones, so we should use multi-arch sha rather than amd64 sha)

-FROM python:3.12.7-alpine@sha256:b83d5ec7274bee17d2f4bd0bfbb082f156241e4513f0a37c70500e1763b1d90d AS base
+FROM python:3.12.7-alpine@sha256:5049c050bdc68575a10bcb1885baa0689b6c15152d8a56a7e399fb49f783bf98 AS base

However, I reverted the changes when trying the PR branch.

vlsi avatar Jan 15 '25 20:01 vlsi

I just checked python binary within emailservice container, and it is indeed x86-64:

% file python3.12
python3.12: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, BuildID[sha1]=9e60ed8a434f8548b6287d5ee1b32b36d3c05662, stripped

vlsi avatar Jan 15 '25 21:01 vlsi

The images are multi-arch ones, so we should use multi-arch sha rather than amd64 sha)

@vlsi, this 👆 is a really good point, I should update the Dockerfiles accordingly, commit about that coming soon, it will be a good start to continue the tests based on that. I'll keep you posted when done. Thanks for your help and your patience! 😃

mathieu-benoit avatar Jan 15 '25 21:01 mathieu-benoit

@bourgeoisor, could you please approve the run of the CI with latest commits? Thanks!

mathieu-benoit avatar Jan 15 '25 22:01 mathieu-benoit

The build fails now:

#9 190.4       cc1: warning: command-line option '-std=c++14' is valid for C++/ObjC++ but not for C
#9 190.4       In file included from third_party/abseil-cpp/absl/base/internal/low_level_alloc.cc:26:
#9 190.4       third_party/abseil-cpp/absl/base/internal/direct_mmap.h:36:10: fatal error: linux/unistd.h: No such file or directory
#9 190.4          36 | #include <linux/unistd.h>
#9 190.4             |          ^~~~~~~~~~~~~~~~
...
#9 190.4 Failed to build grpcio
#9 190.9 ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (grpcio)
#9 ERROR: process "/bin/sh -c pip install -r requirements.txt" did not complete successfully: exit code: 1
...
190.9 ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (grpcio)
------
Dockerfile:25
--------------------
  23 |     # get packages
  24 |     COPY requirements.txt .
  25 | >>> RUN pip install -r requirements.txt
  26 |
  27 |     FROM base
--------------------
ERROR: failed to solve: process "/bin/sh -c pip install -r requirements.txt" did not complete successfully: exit code: 1
Building [recommendationservice]...
Target platforms: [linux/arm64]
Build [recommendationservice] was canceled
Building [shippingservice]...
Target platforms: [linux/arm64]
Build [shippingservice] was canceled
Building [productcatalogservice]...
Target platforms: [linux/arm64]
Build [productcatalogservice] was canceled
build [emailservice] failed: exit status 1. Docker build ran into internal error. Please retry.
If this keeps happening, please open an issue.

vlsi avatar Jan 16 '25 08:01 vlsi

The following fixes the build to a certain degree:

diff --git a/src/emailservice/Dockerfile b/src/emailservice/Dockerfile
index 9105a5a5..3a69c2a3 100644
--- a/src/emailservice/Dockerfile
+++ b/src/emailservice/Dockerfile
@@ -17,7 +17,7 @@ FROM --platform=$BUILDPLATFORM python:3.12.8-alpine@sha256:54bec49592c8455de8d59
 FROM base AS builder

 RUN apk update \
-    && apk add --no-cache wget g++ \
+    && apk add --no-cache wget g++ linux-headers libstdc++ \
     && rm -rf /var/cache/apk/*

 # get packages
diff --git a/src/recommendationservice/Dockerfile b/src/recommendationservice/Dockerfile
index 72add8de..3653e7b6 100644
--- a/src/recommendationservice/Dockerfile
+++ b/src/recommendationservice/Dockerfile
@@ -20,8 +20,18 @@ RUN apk update \
     && apk add --no-cache \
         wget \
         g++ \
+        linux-headers \
+        libstdc++ \
     && rm -rf /var/cache/apk/*

However, both emailservice and recommendationservice still fail:

 - deployment/productcatalogservice is ready. [10/11 deployment(s) still pending]
 - deployment/checkoutservice is ready. [9/11 deployment(s) still pending]
 - deployment/currencyservice is ready. [8/11 deployment(s) still pending]
 - deployment/emailservice: container server terminated with exit code 1
    - pod/emailservice-744746f964-xm2cr: container server terminated with exit code 1
      > [emailservice-744746f964-xm2cr server] Traceback (most recent call last):
      > [emailservice-744746f964-xm2cr server]   File "/email_server/email_server.py", line 22, in <module>
      > [emailservice-744746f964-xm2cr server]     import grpc
      > [emailservice-744746f964-xm2cr server]   File "/usr/local/lib/python3.12/site-packages/grpc/__init__.py", line 22, in <module>
      > [emailservice-744746f964-xm2cr server]     from grpc import _compression
      > [emailservice-744746f964-xm2cr server]   File "/usr/local/lib/python3.12/site-packages/grpc/_compression.py", line 20, in <module>
      > [emailservice-744746f964-xm2cr server]     from grpc._cython import cygrpc
      > [emailservice-744746f964-xm2cr server] ImportError: Error loading shared library libstdc++.so.6: No such file or directory (needed by /usr/local/lib/python3.12/site-packages/grpc/_cython/cygrpc.cpython-312-aarch64-linux-musl.so)
 - deployment/emailservice failed. Error: container server terminated with exit code 1.

vlsi avatar Jan 16 '25 09:01 vlsi

Here's the fix: linux-headers is needed during the build time, and libstdc++ is needed in the runtime.

diff --git a/src/emailservice/Dockerfile b/src/emailservice/Dockerfile
index 9105a5a5..6f6c2dcb 100644
--- a/src/emailservice/Dockerfile
+++ b/src/emailservice/Dockerfile
@@ -17,7 +17,7 @@ FROM --platform=$BUILDPLATFORM python:3.12.8-alpine@sha256:54bec49592c8455de8d59
 FROM base AS builder

 RUN apk update \
-    && apk add --no-cache wget g++ \
+    && apk add --no-cache wget g++ linux-headers \
     && rm -rf /var/cache/apk/*

 # get packages
@@ -30,6 +30,11 @@ ENV PYTHONUNBUFFERED=1
 # Enable Profiler
 ENV ENABLE_PROFILER=1

+RUN apk update \
+    && apk add --no-cache \
+        libstdc++  \
+    && rm -rf /var/cache/apk/*
+
 WORKDIR /email_server

 # Grab packages from builder
diff --git a/src/recommendationservice/Dockerfile b/src/recommendationservice/Dockerfile
index 72add8de..6c661479 100644
--- a/src/recommendationservice/Dockerfile
+++ b/src/recommendationservice/Dockerfile
@@ -20,6 +20,7 @@ RUN apk update \
     && apk add --no-cache \
         wget \
         g++ \
+        linux-headers \
     && rm -rf /var/cache/apk/*

 # get packages
@@ -30,6 +31,11 @@ FROM base
 # Enable unbuffered logging
 ENV PYTHONUNBUFFERED=1

+RUN apk update \
+    && apk add --no-cache \
+        libstdc++  \
+    && rm -rf /var/cache/apk/*
+
 # get packages
 WORKDIR /recommendationservice

vlsi avatar Jan 16 '25 10:01 vlsi

Thanks @vlsi, this https://github.com/GoogleCloudPlatform/microservices-demo/pull/2589#issuecomment-2595187817 did it indeed, awesome!

JFYI, with the new availability today of the arm64 GH runner in public preview, I was able to deploy the OnlineBoutique containers in both platforms, amd64 and arm64, with associated GH runners there: https://github.com/mathieu-benoit/microservices-demo/pull/2. All good now apparently!

@vlsi, could you please give it a try on your end to check that everything is working successfully?

Same for you @bourgeoisor with scaffold dev targeting your GKE cluster with arm64 nodes?

mathieu-benoit avatar Jan 17 '25 01:01 mathieu-benoit

For some reason I'm now unable to build the emailservice. I keep getting a fail exactly here. Tested 3 times, fails exactly at the same spot.

#10 8.836 Building wheels for collected packages: google-cloud-profiler, grpcio
#10 8.836   Building wheel for google-cloud-profiler (pyproject.toml): started
#10 10.42   Building wheel for google-cloud-profiler (pyproject.toml): finished with status 'done'
#10 10.42   Created wheel for google-cloud-profiler: filename=google_cloud_profiler-4.1.0-cp312-cp312-linux_aarch64.whl size=842450 sha256=b5343a25811245973cef0355bfe3d84812ae22c069f6b231efadd29b887d7a14
#10 10.42   Stored in directory: /root/.cache/pip/wheels/4c/e9/0e/051a26de1731259c679b0d9546e4a069b9e2adf536bb0566a2
#10 10.42   Building wheel for grpcio (pyproject.toml): started
#10 71.42   Building wheel for grpcio (pyproject.toml): still running...
ERROR: failed to receive status: rpc error: code = Unavailable desc = error reading from server: EOF
. . .
. . .
build [emailservice] failed: exit status 1. Docker build ran into internal error. Please retry.
If this keeps happening, please open an issue..

bourgeoisor avatar Jan 17 '25 22:01 bourgeoisor

@bourgeoisor, on which platform?

Once confirmed, let's see where it fits in there:

  • The CI is running successfully on this PR on linux/amd64
  • I tested in both linux/amd64 and linux/arm64 via the GH runners: https://github.com/mathieu-benoit/microservices-demo/pull/2
  • This was successfully tested on Macbook M4: https://github.com/GoogleCloudPlatform/microservices-demo/issues/2782#issuecomment-2599374278
  • We may want to hear back from @vlsi to check that it's still working on their end.

So I think, even if it's not yet perfect, we may want to merge this into main as-is (because there is no regression but just improvements) and now on figure out separately remaining issues like yours, WDYT?

Just trying to avoid keeping this PR open for too long.

mathieu-benoit avatar Jan 21 '25 22:01 mathieu-benoit

On an Apple M3, but I'm also using an alternative Docker runtime (Colima) so it might be doing something weird.

Let's get that PR merged anyway for the reasons you described.

bourgeoisor avatar Jan 24 '25 20:01 bourgeoisor