opentelemetry-cpp-contrib icon indicating copy to clipboard operation
opentelemetry-cpp-contrib copied to clipboard

[BUILD] otel-webserver add support for arm64

Open Fydon opened this issue 1 year ago • 21 comments

Fixes #343.

Attempt to enable otel-webserver support for ARM64. The build succeeds and I appear to able to use the webserver module on ARM64.

  1. Should the Dockerfiles, e.g. instrumentation\otel-webserver-module\docker\ubuntu20.04\Dockerfile be updated to have the platforms in different paths?
  2. Should the ARM64 build use the gccArchFlag of -march=armv8-a or should it be one of the others.

Fydon avatar Mar 18 '24 13:03 Fydon

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: Fydon (d9eb65e7030c38cbe68e303cdeacbce50579e4cf, f7e0a2832438bd1f8cb7ebc4cd2a13c134617ee0)

The failure for centos7 is ERROR: write /root/.gradle/wrapper/dists/gradle-4.10.2-all/9fahxiiecdb76a5g3aw9oi8rv/gradle-4.10.2-all.zip: no space left on device

Fydon avatar Mar 18 '24 15:03 Fydon

Hey, can you pull the code again and push your changes. Some builds were failing, so changes have been made to the build pipeline.

aryanishan1001 avatar Mar 19 '24 10:03 aryanishan1001

Would someone be able to run to pipeline to see if it now works? Also please let me know if I should make any changes, e.g. in relation to the questions in my original post.

Fydon avatar Apr 04 '24 10:04 Fydon

Sorry I see that docker compose v1 (docker-compose) was in use. Updated to docker compose v2 (docker compose), which I think resolves the error. Also it probably required QEMU to perform ARM64 builds.

Fydon avatar Apr 04 '24 16:04 Fydon

Thank you for running the workflow. The current GitHub workflow is only building the x64 artifacts. Should I duplicate the jobs so that there is a build for each platform and operating system, i.e. webserver-build-test-ubuntu-x64, webserver-build-test-ubuntu-arm64, webserver-build-test-centos7-x64 and webserver-build-test-centos7-arm64, or should the build step change to build both in a single build step, e.g. docker-compose --profile ubuntu20.04 build? With the single build step, there would probably need to still be separate unit test, upload artifacts and run integrationtest steps for each platform.

Fydon avatar Apr 05 '24 10:04 Fydon

Thank you for running the workflow. The current GitHub workflow is only building the x64 artifacts. Should I duplicate the jobs so that there is a build for each platform and operating system, i.e. webserver-build-test-ubuntu-x64, webserver-build-test-ubuntu-arm64, webserver-build-test-centos7-x64 and webserver-build-test-centos7-arm64, or should the build step change to build both in a single build step, e.g. docker-compose --profile ubuntu20.04 build? With the single build step, there would probably need to still be separate unit test, upload artifacts and run integrationtest steps for each platform.

No idea, this is a question for the otel-webserver maintainers, who know this area best.

I just press the workflow button ;-)

marcalff avatar Apr 05 '24 10:04 marcalff

Thank you @marcalff. I assume that the maintainers would also be reviewing these changes at some point and so be able to advise?

I also noticed that the unit tests are currently platform specific, so I'm working on updating gradle to handle this.

Fydon avatar Apr 05 '24 10:04 Fydon

I've gone for having extra jobs for ARM64. The unit tests now work for both platforms, but the integrations tests don't work on my Windows PC for x64 or ARM64. They may work in the GitHub Action.

When I run CentOS 7 with ARM64 locally, I run into a problem with QEMU and yum so instead I attempted to have the Ubuntu job build the artifacts. Given that there are difference between the CentOS and Ubuntu Dockerfiles, is there a problem with using Ubuntu for releases?

Fydon avatar Apr 08 '24 21:04 Fydon

I tried using podman 5 instead of docker to build the CentOS 7 ARM release and the build succeeded. Unfortunately it got stuck exporting the layers of the image and so couldn't progress beyond that. However as the integration test failed in the GitHub action with the Ubuntu build, I thought it worth seeing if the GitHub action would get beyond the first yum command (4th command in Dockerfile), unlike my experience with Docker Desktop 4.29.0 (145265).

Fydon avatar Apr 11 '24 20:04 Fydon

Hi @Fydon , I am assuming this is supposed to work on below arch as well, can you please confirm. 5.10.199-190.747.amzn2.aarch64

We thought of instrumenting nginx with otel but it failed there.

deepaksrivastavaz avatar Apr 12 '24 04:04 deepaksrivastavaz

I have only used the apache functionality. Yes aarch64 is ARM64. This is a step to having this work, but it may require more work. Particularly the integration tests may need to be changed if they aren't testing ARM64. If using a clone of the latest version of this pull request is not working for you feel free to make the necessary changes and contribute them once this is merged.

Fydon avatar Apr 12 '24 11:04 Fydon

GitHub has a beta for ARM based runners for GitHub Actions. That should significantly improve the performance of these builds. Perhaps for now the solution is to review the changes, but not run the CentOS ARM64 build?

I'm using the Apache ARM64 build on Ubuntu 22.04 to obtain traces, but have not tested Nginx.

Fydon avatar Apr 17 '24 09:04 Fydon

Okay. I think that is done

Fydon avatar May 02 '24 14:05 Fydon

Sorry I realised that I left the build arguments in the CentOS 7 ARM build. I've committed but not pushed removing these as it will cause another need for a build. There will probably be more changes as a part of the review so they will be a part of the next update.

Fydon avatar May 03 '24 13:05 Fydon

ARM64 hosted runners have now been released. Could you please set one up so that the workflow can run quicker? If so, please let me know the name so that I can update the workflow.

Fydon avatar Jun 04 '24 08:06 Fydon

there are x64 references in otel-webserver-module/src/nginx/script.sh and Makefile that caused an undefined symbol: matchIgnorePathRegex fatal error with my similar patchset, and might need to be addressed here

pl4nty avatar Aug 29 '24 13:08 pl4nty

@pl4nty I don't currently use nginx. If you can provide the necessary changes, I can add them. I've use the ARM64 build with Apache so that does work.

However this pull request probably needs a maintainer to add a ARM64 host runner so that the build can complete.

Fydon avatar Aug 29 '24 16:08 Fydon

yeah I've been running httpd too, only tested nginx yesterday. not sure if there's a way to do this with variables

--- a/instrumentation/otel-webserver-module/src/nginx/Makefile
+++ a/instrumentation/otel-webserver-module/src/nginx/Makefile
@@ -350,6 +350,7 @@ objs/nginx: objs/src/core/nginx.o \
        objs/src/http/modules/ngx_http_upstream_zone_module.o \
        objs/ngx_modules.o \
        -L/root/webserver-agent/build/linux-x64/opentelemetry-webserver-sdk/sdk_lib/lib -lopentelemetry_webserver_sdk -ldl -lrt -lpthread -lcrypt -lpcre -lz \
+       -L/root/webserver-agent/build/linux-arm64/opentelemetry-webserver-sdk/sdk_lib/lib -lopentelemetry_webserver_sdk -ldl -lrt -lpthread -lcrypt -lpcre -lz \
        -Wl,-E
        
 
@@ -1207,6 +1208,8 @@ objs/ngx_http_opentelemetry_module.so:    objs/addon/nginx/ngx_http_opentelemetry_l
        objs/addon/nginx/ngx_http_opentelemetry_module.o \
        objs/ngx_http_opentelemetry_module_modules.o \
        -L../linux-x64/opentelemetry-webserver-sdk/sdk_lib/lib \
+    -lopentelemetry_webserver_sdk \
+       -L../linux-arm64/opentelemetry-webserver-sdk/sdk_lib/lib \
     -lopentelemetry_webserver_sdk \
        -shared
--- a/instrumentation/otel-webserver-module/src/nginx/script.sh
+++ b/instrumentation/otel-webserver-module/src/nginx/script.sh
@@ -2,4 +2,6 @@
 fileName=$1
 
 sed -i "s/-L\/otel-webserver-module\/build\/linux-x64\/opentelemetry-webserver-sdk\/sdk_lib\/lib\ -lopentelemetry_webserver_sdk\ -ldl\ -lpthread\ -lcrypt\ -lpcre\ -lz\ \\\/-L\/otel-webserver-module\/build\/linux-x64\/opentelemetry-webserver-sdk\/sdk_lib\/lib\ -lopentelemetry_webserver_sdk\ -ldl\ -lrt\ -lpthread\ -lcrypt\ -lpcre\ -lz\ \\\/g" $fileName
+sed -i "s/-L\/otel-webserver-module\/build\/linux-arm64\/opentelemetry-webserver-sdk\/sdk_lib\/lib\ -lopentelemetry_webserver_sdk\ -ldl\ -lpthread\ -lcrypt\ -lpcre\ -lz\ \\\/-L\/otel-webserver-module\/build\/linux-arm64\/opentelemetry-webserver-sdk\/sdk_lib\/lib\ -lopentelemetry_webserver_sdk\ -ldl\ -lrt\ -lpthread\ -lcrypt\ -lpcre\ -lz\ \\\/g" $fileName
 sed -i "s/-L\/otel-webserver-module\/build\/linux-x64\/opentelemetry-webserver-sdk\/sdk_lib\/lib\ \\\/-L\/otel-webserver-module\/build\/linux-x64\/opentelemetry-webserver-sdk\/sdk_lib\/lib\ -lopentelemetry_webserver_sdk\ \\\/g" $fileName
+sed -i "s/-L\/otel-webserver-module\/build\/linux-arm64\/opentelemetry-webserver-sdk\/sdk_lib\/lib\ \\\/-L\/otel-webserver-module\/build\/linux-arm64\/opentelemetry-webserver-sdk\/sdk_lib\/lib\ -lopentelemetry_webserver_sdk\ \\\/g" $fileName

pl4nty avatar Aug 30 '24 11:08 pl4nty

ARM runners are now freely available, but the lowest version is ubuntu-22.04-arm not 20.04. You'll also need #512 to fix the build

pl4nty avatar Jan 21 '25 01:01 pl4nty

Thank you @pl4nty. I'm waiting for movement from open-telemetry members before I put any more time into this pull request, which is why I haven't applied your changes. However feel free to add a pull request to https://github.com/Fydon/opentelemetry-cpp-contrib/tree/feature/webserver-arm64 for any changes you'd like to see linked to ARM64. Hopefully with free ARM64 runners that having ARM64 builds can get attention again, even if it's not done with work in this pull request.

Fydon avatar Jan 21 '25 16:01 Fydon