triton
triton copied to clipboard
Missing native Arm64 LLVM binaries on Linux
The prebuilt Arm64 LLVM binaries for Ubuntu that are used by setup.py are actually x64 binaries :)
https://tritonlang.blob.core.windows.net/llvm-builds/llvm-f22cde10-ubuntu-arm64.tar.gz
$ objdump -h mlir-tblgen mlir-tblgen: file format elf64-x86-64
This PR https://github.com/openai/triton/pull/2003 contributed support for building Triton on Linux using prebuilt native Arm64 LLVM binaries.
And a following PR https://github.com/openai/triton/commit/721897fcc4f942aa97d2e9ba3787a5e213758177 changed the location of the binaries to windows.net which appears to have broke the native support.
would you be able to send a PR to fix it?
@NathanielMcVicar was going to take a look at the Arm64 LLVM builds
Yeah, I have several ARM nodes and encountered the same issue before. Please pin me when you have a PR ready and I'm happy to a look and have a try.
would you be able to send a PR to fix it?
@Jokeren @ThomasRaoux Can OpenAI add an Arm64 Ubuntu VM that we can use to build native Arm binaries?
Yeah, I have several ARM nodes and encountered the same issue before. Please pin me when you have a PR ready and I'm happy to a look and have a try.
see this setup, installing on an arm64 machine on the latest version of triton worked with no issue after my changes https://github.com/danikhan632/triton/blob/main/python/setup.py
I had to build/bundle llvm tarball on my own system: https://storage.googleapis.com/compiled-blob/llvm-c2301380-ubuntu-arm64.tar.gz
Thanks. I'll try it out soon
May I know what is the following tar file? Is it something you built?
url = "https://storage.googleapis.com/compiled-blob/llvm-c2301380-ubuntu-arm64.tar.gz"
May I know what is the following tar file? Is it something you built?
url = "https://storage.googleapis.com/compiled-blob/llvm-c2301380-ubuntu-arm64.tar.gz"
Yes I custom built this this which is llvm-c2301380-ubuntu-arm64.tar.gz but I compiled this on an Arm64 and should work properly
➜ bin file mlir-opt mlir-opt: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (GNU/Linux)
this link should not be merged into main but is here as a temporary till the official one works, see this PR
Building LLVM from source on Arm64 works but that doesn't fix this issue which is about the prebuilt LLVM Arm64 binaries provided by OpenAI.
Building LLVM from source on Arm64 works but that doesn't fix this issue which is about the prebuilt LLVM Arm64 binaries provided by OpenAI.
Yeah I updated my PR as this in the workflow is an issue which I updated in my PR here to build NVPTX and AMDGPU when targeting arm64
Hi - I am seeing a similar issue with main ToT when building through pip
cd /home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8 && /home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/bin/mlir-tblgen -gen-pass-decls --name TritonToTritonGPU -I /home/nvidia/triton/include/triton/Conversion/TritonToTritonGPU -I/home/nvidia/triton/include -I/home/nvidia/.triton/pybind11/pybind11-2.11.1/include -I/home/nvidia/triton/. -I/home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/include -I/home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/include -I/home/nvidia/triton/include -I/home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8/include -I/home/nvidia/triton/third_party -I/home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8/third_party /home/nvidia/triton/include/triton/Conversion/TritonToTritonGPU/Passes.td --write-if-changed -o include/triton/Conversion/TritonToTritonGPU/Passes.h.inc -d include/triton/Conversion/TritonToTritonGPU/Passes.h.inc.d
FAILED: include/triton/Conversion/TritonToTritonGPU/Passes.h.inc /home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8/include/triton/Conversion/TritonToTritonGPU/Passes.h.inc
cd /home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8 && /home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/bin/mlir-tblgen -gen-pass-decls --name TritonToTritonGPU -I /home/nvidia/triton/include/triton/Conversion/TritonToTritonGPU -I/home/nvidia/triton/include -I/home/nvidia/.triton/pybind11/pybind11-2.11.1/include -I/home/nvidia/triton/. -I/home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/include -I/home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/include -I/home/nvidia/triton/include -I/home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8/include -I/home/nvidia/triton/third_party -I/home/nvidia/triton/python/build/cmake.linux-aarch64-cpython-3.8/third_party /home/nvidia/triton/include/triton/Conversion/TritonToTritonGPU/Passes.td --write-if-changed -o include/triton/Conversion/TritonToTritonGPU/Passes.h.inc -d include/triton/Conversion/TritonToTritonGPU/Passes.h.inc.d
/bin/sh: 1: /home/nvidia/.triton/llvm/llvm-4017f04e-ubuntu-arm64/bin/mlir-tblgen: Exec format error
mlir-tblgen is elf64-little. Is it still not supposed to work with arm?
We are aware of the issue. @danikhan632 is working on a fix I believe
We are aware of the issue. @danikhan632 is working on a fix I believe
going to have the workflow runner build x64 binaries first just for mlir-tblgen, clang-tblgen, llvm-tblgen then build the full arm64 one, since it looks like those 3 are needed
We are aware of the issue. @danikhan632 is working on a fix I believe
Got it fully fixed, built triton with it, ran the binaries, python mlir bindings etc
https://github.com/openai/triton/pull/3180
@danikhan632 Thanks for fixing this. The tarball in your PR does have the appropriate binaries but pip install ... still pulls the old elf64-little version. Can you have a look? Thanks
I can confirm this problem
@danikhan632 Thanks for fixing this. The tarball in your PR does have the appropriate binaries but
pip install ...still pulls the old elf64-little version. Can you have a look? Thanks
In order to use my binaries from the AWS bucket link you would have to manually change out the https://github.com/openai/triton/blob/main/python/setup.py#L153 URL for the AWS tarball. Unfortunalty, this PR doesn't update the existing tarballs, just ensures that when the LLVM version is bumped then the tarball will be built correctly. Only way to fix that is to swap out the URL, wait for an LLVM bump or, I suppose if @Jokeren re-ran the LLVM build it would fix it.
Got it. Thanks for the reply! I just reran the llvm build workflow
Looks like the workflow runner stopped due to some odd error with the macos x64 build, don't think my PR had anything todo with that
I'm working on it
@danikhan632 May I know why do we need a x86 target here? I asked because I encountered this error undefined symbol: LLVMInitializeX86Target.
https://github.com/openai/triton/blob/3fc0b891448bba73fcd3e513d21c6dc8569bf183/.github/workflows/llvm-build.yml#L179
X86 isn't really needed. That was more about trying to have the same build on X86 and Arm.
It probably doesn't build for you because X86 Init is only included when not Arm64:
https://github.com/openai/triton/blob/3fc0b891448bba73fcd3e513d21c6dc8569bf183/CMakeLists.txt#L213C3-L213C9
If you change the else() to endif() [and remove other endif()] then this should link on Arm with X86. Or remove X86 as a target.
yeah, encountered that issue today and kindof suspected that, pushed a change to get rid of that
https://github.com/openai/triton/pull/3223
The next build problem is probably a missing "sudo apt-get update"
https://github.com/openai/triton/actions/runs/8074918680
E: Failed to fetch mirror+file:/etc/apt/apt-mirrors.txt/pool/main/b/binutils/binutils-arm-linux-gnueabihf_2.34-6ubuntu1.8_amd64.deb 404 Not Found [IP: 40.81.13.82 80] E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
The next build problem is probably a missing "sudo apt-get update"
https://github.com/openai/triton/actions/runs/8074918680
E: Failed to fetch mirror+file:/etc/apt/apt-mirrors.txt/pool/main/b/binutils/binutils-arm-linux-gnueabihf_2.34-6ubuntu1.8_amd64.deb 404 Not Found [IP: 40.81.13.82 80] E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
should be able to fix it, though I'm concerned with why it hasn't happened before and if any of these urls might cause issues later
Yes, that's kind weird
The new LLVM binaries are working on Ubuntu 22.04/Arm64 but the default target triple is x64:
/home/ubuntu/.triton/llvm/llvm-4017f04e-ubuntu-arm64/bin/llc: error: unable to get target for 'x86_64-unknown-linux-gnu', see --version and --triple.
We can set the default to Arm64 in llvm-build.yml:
-DLLVM_DEFAULT_TARGET_TRIPLE=aarch64-linux-gnu
https://github.com/openai/triton/blob/005085f3feec57993fef840f2e0e03f5f641dd50/.github/workflows/llvm-build.yml#L181
This has a fix for the target triple: https://github.com/openai/triton/pull/3239
@danikhan632 It works fine now!