incubator-gluten icon indicating copy to clipboard operation
incubator-gluten copied to clipboard

[VL] Build gluten on aarch64 with VCPKG

Open z-anderson opened this issue 1 year ago • 6 comments

Problem description

I'm trying to build gluten and velox on aarch64 architecture. I'm running the script ./dev/builddeps-veloxbe.sh --enable_vcpkg=ON --spark_version=3.5 . I get the error below. I see that these options -mavx2 -mfma -mavx -mf16c -mlzcnt -mbmi2 are only for x86_64, so they won't work on aarch64. Do you know how I can build gluten for aarch64 please? (In version 1.1.x.x., we were able to build for both aarch64 and x86_64).

I see vcpkg init.sh has VCPKG_TRIPLET=x64-linux-avx https://github.com/apache/incubator-gluten/blob/c82af60a635fb52bf1314cd831f5915dd37d910d/dev/vcpkg/init.sh#L10. It makes sense that the x64 and the avx parts of this triplet will cause errors running on aarch64. I see it got changed in https://github.com/apache/incubator-gluten/commit/f00e4536bc7ccc56a2eee5800084319130cbf806#diff-4a8a09cebf7ed92cb7321136b9ef223c4653757270e77e942fd6fe2dca30793e - do you recommend that we should cherry-pick that commit (and a few more commits that it depends on)?

CMake Error at /usr/local/share/cmake-3.28/Modules/CMakeTestCCompiler.cmake:67 (message):
  The C compiler

    "/usr/bin/cc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: '/home/circleci/project/dev/vcpkg/.vcpkg/buildtrees/detect_compiler/x64-linux-avx-rel/CMakeFiles/CMakeScratch/TryCompile-9obttD'

    Run Build Command(s): /usr/bin/ninja -v cmTC_32844
    [1/2] /usr/bin/cc   -fPIC -mavx2 -mfma -mavx -mf16c -mlzcnt -mbmi2 -o CMakeFiles/cmTC_32844.dir/testCCompiler.c.o -c /home/circleci/project/dev/vcpkg/.vcpkg/buildtrees/detect_compiler/x64-linux-avx-rel/CMakeFiles/CMakeScratch/TryCompile-9obttD/testCCompiler.c
    FAILED: CMakeFiles/cmTC_32844.dir/testCCompiler.c.o
    /usr/bin/cc   -fPIC -mavx2 -mfma -mavx -mf16c -mlzcnt -mbmi2 -o CMakeFiles/cmTC_32844.dir/testCCompiler.c.o -c /home/circleci/project/dev/vcpkg/.vcpkg/buildtrees/detect_compiler/x64-linux-avx-rel/CMakeFiles/CMakeScratch/TryCompile-9obttD/testCCompiler.c
    cc: error: unrecognized command line option ‘-mavx2’
    cc: error: unrecognized command line option ‘-mfma’
    cc: error: unrecognized command line option ‘-mavx’
    cc: error: unrecognized command line option ‘-mf16c’
    cc: error: unrecognized command line option ‘-mlzcnt’
    cc: error: unrecognized command line option ‘-mbmi2’
    ninja: build stopped: subcommand failed.





  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:11 (enable_language)

System information

Velox System Info v0.0.2 Commit: 69a6bb9602c97050cc3459a09c846b84182ecb9c CMake Version: 3.28.3 System: Linux-5.15.0-1070-aws Arch: aarch64 CPU Name: Model name: Neoverse-N1 C++ Compiler: /usr/bin/c++ C++ Compiler Version: 9.4.0 C Compiler: /usr/bin/cc C Compiler Version: 9.4.0 CMake Prefix Path: /usr/local;/usr;/;/usr/local;/usr/local;/usr/X11R6;/usr/pkg;/opt

\nThe results will be copied to your clipboard if xclip is installed.

CMake log

No response

z-anderson avatar Oct 24 '24 01:10 z-anderson

do you recommend that we should cherry-pick that commit (and a few more commits that it depends on)?

@z-anderson, I guess that commit will not fix this issue.

With VCPKG_TRIPLET=x64-linux-avx, vcpkg will use a custom triplet file we created, see https://github.com/apache/incubator-gluten/blob/main/dev/vcpkg/triplets/x64-linux-avx.cmake. You can see some flags are set specifically for x86_64.

To support arm64, I recommend you to create another triplet file with proper flags set, e.g. set(VCPKG_TARGET_ARCHITECTURE arm64)(see reference link). Then, in Gluten script, set VCPKG_TRIPLET to this new custom triple instead of x64-linux-avx when target architecture is arm64 (we may make target architecture a configurable build option).

PHILO-HE avatar Oct 24 '24 07:10 PHILO-HE

Thank you!

z-anderson avatar Oct 24 '24 16:10 z-anderson

@z-anderson, could you create a pr to fix it?

PHILO-HE avatar Oct 25 '24 01:10 PHILO-HE

@z-anderson would you like to create the PR and maintain it? We currently don't have any ARM CI tests actually. There are several guys submitting PRs on ARM. Maybe you all can work together and contribute.

FelixYBW avatar Oct 25 '24 06:10 FelixYBW

Sounds good. Yes, I'll put up a PR with my changes to https://github.com/apache/incubator-gluten/blob/main/dev/vcpkg/init.sh for this.

z-anderson avatar Oct 25 '24 15:10 z-anderson

I put up a draft https://github.com/apache/incubator-gluten/pull/7687 . This PR gets past that error, but I'm still working on getting it to build completely (at least for my set up).

z-anderson avatar Oct 25 '24 18:10 z-anderson

I don't have permission to create a new github runner (with ARM). Could someone else create the new runner, please? Then I'll use it in https://github.com/apache/incubator-gluten/blob/main/.github/workflows/build_bundle_package.yml

z-anderson avatar Oct 27 '24 14:10 z-anderson

@z-anderson, could you first validate your fix in your arm64 environment? Currently, our CI resources are near to the limitation. We can add new jobs on arm in the future when some resources are freed up.

PHILO-HE avatar Oct 28 '24 01:10 PHILO-HE

In my arm64 environment, it actually finds the community triplet (so it doesn't use the new triplet file). Using the community triplet does correctly get it to start building for arm64 instead of amd.

z-anderson avatar Oct 28 '24 12:10 z-anderson

I don't have permission to create a new github runner (with ARM). Could someone else create the new runner, please? Then I'll use it in https://github.com/apache/incubator-gluten/blob/main/.github/workflows/build_bundle_package.yml

You may track the UT internally firstly. We may plan to add the ARM runner in future.

FelixYBW avatar Oct 29 '24 07:10 FelixYBW

So will we continue the PR, but not do CI tests with an ARM runner for now?

z-anderson avatar Oct 29 '24 13:10 z-anderson

So will we continue the PR, but not do CI tests with an ARM runner for now?

@z-anderson, yes, we can land this pr as long as it is verified on your side.

PHILO-HE avatar Oct 29 '24 13:10 PHILO-HE