triton icon indicating copy to clipboard operation
triton copied to clipboard

Missing native Arm64 LLVM binaries on Linux

Open aaronsm opened this issue 1 year ago • 10 comments

The prebuilt Arm64 LLVM binaries for Ubuntu that are used by setup.py are actually x64 binaries :)

https://tritonlang.blob.core.windows.net/llvm-builds/llvm-f22cde10-ubuntu-arm64.tar.gz

$ objdump -h mlir-tblgen mlir-tblgen: file format elf64-x86-64


This PR https://github.com/openai/triton/pull/2003 contributed support for building Triton on Linux using prebuilt native Arm64 LLVM binaries.

And a following PR https://github.com/openai/triton/commit/721897fcc4f942aa97d2e9ba3787a5e213758177 changed the location of the binaries to windows.net which appears to have broke the native support.

aaronsm avatar Jan 11 '24 23:01 aaronsm

would you be able to send a PR to fix it?

ThomasRaoux avatar Jan 11 '24 23:01 ThomasRaoux

@NathanielMcVicar was going to take a look at the Arm64 LLVM builds

aaronsm avatar Jan 11 '24 23:01 aaronsm

Yeah, I have several ARM nodes and encountered the same issue before. Please pin me when you have a PR ready and I'm happy to a look and have a try.

Jokeren avatar Jan 12 '24 01:01 Jokeren

would you be able to send a PR to fix it?

@Jokeren @ThomasRaoux Can OpenAI add an Arm64 Ubuntu VM that we can use to build native Arm binaries?

aaronsm avatar Jan 19 '24 22:01 aaronsm

Yeah, I have several ARM nodes and encountered the same issue before. Please pin me when you have a PR ready and I'm happy to a look and have a try.

see this setup, installing on an arm64 machine on the latest version of triton worked with no issue after my changes https://github.com/danikhan632/triton/blob/main/python/setup.py

I had to build/bundle llvm tarball on my own system: https://storage.googleapis.com/compiled-blob/llvm-c2301380-ubuntu-arm64.tar.gz

danikhan632 avatar Jan 22 '24 19:01 danikhan632

Thanks. I'll try it out soon

Jokeren avatar Jan 22 '24 19:01 Jokeren

May I know what is the following tar file? Is it something you built?

url = "https://storage.googleapis.com/compiled-blob/llvm-c2301380-ubuntu-arm64.tar.gz"

Jokeren avatar Jan 22 '24 19:01 Jokeren

May I know what is the following tar file? Is it something you built?

url = "https://storage.googleapis.com/compiled-blob/llvm-c2301380-ubuntu-arm64.tar.gz"

Yes I custom built this this which is llvm-c2301380-ubuntu-arm64.tar.gz but I compiled this on an Arm64 and should work properly

➜ bin file mlir-opt mlir-opt: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (GNU/Linux)

this link should not be merged into main but is here as a temporary till the official one works, see this PR

danikhan632 avatar Jan 22 '24 19:01 danikhan632

Building LLVM from source on Arm64 works but that doesn't fix this issue which is about the prebuilt LLVM Arm64 binaries provided by OpenAI.

aaronsm avatar Jan 22 '24 20:01 aaronsm

Building LLVM from source on Arm64 works but that doesn't fix this issue which is about the prebuilt LLVM Arm64 binaries provided by OpenAI.

Yeah I updated my PR as this in the workflow is an issue which I updated in my PR here to build NVPTX and AMDGPU when targeting arm64

danikhan632 avatar Jan 22 '24 20:01 danikhan632

Hi - I am seeing a similar issue with main ToT when building through pip

cd /home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8 && /home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/bin/mlir-tblgen -gen-pass-decls --name TritonToTritonGPU -I /home/nvidia/triton/include/triton/Conversion/TritonToTritonGPU -I/home/nvidia/triton/include -I/home/nvidia/.triton/pybind11/pybind11-2.11.1/include -I/home/nvidia/triton/. -I/home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/include -I/home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/include -I/home/nvidia/triton/include -I/home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8/include -I/home/nvidia/triton/third_party -I/home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8/third_party /home/nvidia/triton/include/triton/Conversion/TritonToTritonGPU/Passes.td --write-if-changed -o include/triton/Conversion/TritonToTritonGPU/Passes.h.inc -d include/triton/Conversion/TritonToTritonGPU/Passes.h.inc.d
    FAILED: include/triton/Conversion/TritonToTritonGPU/Passes.h.inc /home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8/include/triton/Conversion/TritonToTritonGPU/Passes.h.inc
    cd /home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8 && /home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/bin/mlir-tblgen -gen-pass-decls --name TritonToTritonGPU -I /home/nvidia/triton/include/triton/Conversion/TritonToTritonGPU -I/home/nvidia/triton/include -I/home/nvidia/.triton/pybind11/pybind11-2.11.1/include -I/home/nvidia/triton/. -I/home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/include -I/home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/include -I/home/nvidia/triton/include -I/home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8/include -I/home/nvidia/triton/third_party -I/home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8/third_party /home/nvidia/triton/include/triton/Conversion/TritonToTritonGPU/Passes.td --write-if-changed -o include/triton/Conversion/TritonToTritonGPU/Passes.h.inc -d include/triton/Conversion/TritonToTritonGPU/Passes.h.inc.d
    /bin/sh: 1: /home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/bin/mlir-tblgen: Exec format error

mlir-tblgen is elf64-little. Is it still not supposed to work with arm?

mabubakarpurdue avatar Feb 22 '24 16:02 mabubakarpurdue

We are aware of the issue. @danikhan632 is working on a fix I believe

Jokeren avatar Feb 22 '24 17:02 Jokeren

We are aware of the issue. @danikhan632 is working on a fix I believe

going to have the workflow runner build x64 binaries first just for mlir-tblgen, clang-tblgen, llvm-tblgen then build the full arm64 one, since it looks like those 3 are needed

danikhan632 avatar Feb 22 '24 18:02 danikhan632

We are aware of the issue. @danikhan632 is working on a fix I believe

Got it fully fixed, built triton with it, ran the binaries, python mlir bindings etc

https://github.com/openai/triton/pull/3180

danikhan632 avatar Feb 23 '24 18:02 danikhan632

@danikhan632 Thanks for fixing this. The tarball in your PR does have the appropriate binaries but pip install ... still pulls the old elf64-little version. Can you have a look? Thanks

mabubakarpurdue avatar Feb 26 '24 01:02 mabubakarpurdue

I can confirm this problem

Jokeren avatar Feb 26 '24 01:02 Jokeren

@danikhan632 Thanks for fixing this. The tarball in your PR does have the appropriate binaries but pip install ... still pulls the old elf64-little version. Can you have a look? Thanks

In order to use my binaries from the AWS bucket link you would have to manually change out the https://github.com/openai/triton/blob/main/python/setup.py#L153 URL for the AWS tarball. Unfortunalty, this PR doesn't update the existing tarballs, just ensures that when the LLVM version is bumped then the tarball will be built correctly. Only way to fix that is to swap out the URL, wait for an LLVM bump or, I suppose if @Jokeren re-ran the LLVM build it would fix it.

danikhan632 avatar Feb 26 '24 04:02 danikhan632

Got it. Thanks for the reply! I just reran the llvm build workflow

Jokeren avatar Feb 26 '24 15:02 Jokeren

Looks like the workflow runner stopped due to some odd error with the macos x64 build, don't think my PR had anything todo with that image

danikhan632 avatar Feb 26 '24 19:02 danikhan632

I'm working on it

Jokeren avatar Feb 26 '24 19:02 Jokeren

@danikhan632 May I know why do we need a x86 target here? I asked because I encountered this error undefined symbol: LLVMInitializeX86Target.

https://github.com/openai/triton/blob/3fc0b891448bba73fcd3e513d21c6dc8569bf183/.github/workflows/llvm-build.yml#L179

Jokeren avatar Feb 28 '24 02:02 Jokeren

X86 isn't really needed. That was more about trying to have the same build on X86 and Arm.

aaronsm avatar Feb 28 '24 02:02 aaronsm

It probably doesn't build for you because X86 Init is only included when not Arm64:

https://github.com/openai/triton/blob/3fc0b891448bba73fcd3e513d21c6dc8569bf183/CMakeLists.txt#L213C3-L213C9

If you change the else() to endif() [and remove other endif()] then this should link on Arm with X86. Or remove X86 as a target.

aaronsm avatar Feb 28 '24 02:02 aaronsm

yeah, encountered that issue today and kindof suspected that, pushed a change to get rid of that

https://github.com/openai/triton/pull/3223

danikhan632 avatar Feb 28 '24 02:02 danikhan632

The next build problem is probably a missing "sudo apt-get update"

https://github.com/openai/triton/actions/runs/8074918680

E: Failed to fetch mirror+file:/etc/apt/apt-mirrors.txt/pool/main/b/binutils/binutils-arm-linux-gnueabihf_2.34-6ubuntu1.8_amd64.deb 404 Not Found [IP: 40.81.13.82 80] E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

aaronsm avatar Feb 28 '24 03:02 aaronsm

The next build problem is probably a missing "sudo apt-get update"

https://github.com/openai/triton/actions/runs/8074918680

E: Failed to fetch mirror+file:/etc/apt/apt-mirrors.txt/pool/main/b/binutils/binutils-arm-linux-gnueabihf_2.34-6ubuntu1.8_amd64.deb 404 Not Found [IP: 40.81.13.82 80] E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

should be able to fix it, though I'm concerned with why it hasn't happened before and if any of these urls might cause issues later

danikhan632 avatar Feb 28 '24 04:02 danikhan632

Yes, that's kind weird

Jokeren avatar Feb 28 '24 04:02 Jokeren

The new LLVM binaries are working on Ubuntu 22.04/Arm64 but the default target triple is x64:

/home/ubuntu/.triton/llvm/llvm-4017f04e-ubuntu-arm64/bin/llc: error: unable to get target for 'x86_64-unknown-linux-gnu', see --version and --triple.

We can set the default to Arm64 in llvm-build.yml:

-DLLVM_DEFAULT_TARGET_TRIPLE=aarch64-linux-gnu

https://github.com/openai/triton/blob/005085f3feec57993fef840f2e0e03f5f641dd50/.github/workflows/llvm-build.yml#L181

aaronsm avatar Feb 28 '24 21:02 aaronsm

This has a fix for the target triple: https://github.com/openai/triton/pull/3239

aaronsm avatar Feb 28 '24 21:02 aaronsm

@danikhan632 It works fine now!

Jokeren avatar Feb 28 '24 22:02 Jokeren