HIP
HIP copied to clipboard
Inclusion of cuda_wrappers headers needs to be fixed.
https://github.com/ROCm-Developer-Tools/HIP/blob/dabba8a3678f8cf57994feaa5dfe5591621fb6ce/include/hip/hcc_detail/hip_runtime.h#L517
#include <__clang_cuda_math_forward_declares.h>
#include <__clang_cuda_complex_builtins.h>
// Workaround for using libc++ with HIP-Clang.
// The following headers requires clang include path before standard C++ include path.
// However libc++ include path requires to be before clang include path.
// To workaround this, we pass -isystem with the parent directory of clang include
// path instead of the clang include path itself.
#include <include/cuda_wrappers/algorithm>
#include <include/cuda_wrappers/complex>
#include <include/cuda_wrappers/new>
This is bound to break in interesting ways (already does for us internally).
Those headers are intended to be used with CUDA and they assume very particular location in the clang's include paths.
The assertion Following headers requires clang include path before standard C++ include path
is false. It's the clang_wrappers
that must be included before the standard C++ library include path. The standard clang headers do not.
Here's what canonical CUDA include paths look like:
/usr/local/google/home/tra/work/llvm/build/debug/lib/clang/11.0.0/include/cuda_wrappers
/usr/local/cuda/include
/usr/local/google/home/tra/work/llvm/build/debug/bin/../include/c++/v1
/usr/local/include
/usr/local/google/home/tra/work/llvm/build/debug/lib/clang/11.0.0/include
/usr/include/x86_64-linux-gnu
/usr/include
Better way to address this is to teach clang that HIP compilation needs to add cuda_wrappers path in the right place. Here's where it's done for CUDA: https://github.com/llvm/llvm-project/blob/74e6a5b2a3b0e1992fb38c986b18f1aa7ca6fabd/clang/lib/Driver/ToolChains/Cuda.cpp#L239 HIP will need something similar. AFAICT, there's no reliable enough way to reliably control placement of cuda_wrappers in the search path from command line.
Directly including cuda_wrappers
headers (and only three of them) relies on clang's implementation details and may be broken at any point in time. Adding include path to the parent directory is also unreliable. E.g during distributed build cuda_wrappers subdirectory may be placed in a directory structure constructed via symlinks, so a simple cuda_wrappers/../..
may be pointing somewhere with no include/cuda_wrappers/
in it and user has no way to know where exactly compiler's headers are located ahead of time and the location may be different on different build nodes. The compiler on that machine does -- another point for teaching clang that HIP needs cuda_wrappers.
Another issue is that these wrapper headers are written with the assumption of being included from a CUDA compilation. They assume that that all CUDA-specific macros have already been defined by __clang_cuda_runtime_wrapper.h
which is force-included for all CUDA compilations. As things are now, once cuda_wrappers
is in the correct place in the search path, '#include __device__
are not defined.
The wrappers must work correctly if someone includes them from a HIP compilation, but before hip_runtime.h. E.g. rocm_hip/tests/src/kernel/hipPrintfKernel.cpp
https://github.com/ROCm-Developer-Tools/HIP/blob/dabba8a3678f8cf57994feaa5dfe5591621fb6ce/tests/src/kernel/hipPrintfKernel.cpp#L28 does that via test_common.h -> iostream -> [lots of libc++ headers] -> new.
Thanks for the suggestion. I will try to fix it.
I'm having an issue where hip-clang doesn't see cuda_wrappers at all, i can't find them on my sytem either. Where do these come from?
The wrappers are part of clang installation. They should be in the same directory where clang finds its other header files. https://github.com/llvm/llvm-project/tree/main/clang/lib/Headers
One way to find the path is to run clang -v -x c++ /dev/null -fsyntax-only
and check the list of include paths. The path to clang's will be towards the end of the list , after the C++ library headers, but before the /usr/include
interesting, thanks for the info @Artem-B I might be chasing the wrong issue
my installation of clang did not have the project active to get the cuda_wrappers included in the installation. I fixed that but...
It seems for whatever reason miopen is generating compilation commands that are somehow missing -isystem directories and also include extra ones.
There's an -isystem on the command line to compile naive_conv from pytorch that is including hip_runtime.h that can't find inlclude/cuda_wrappers/algorithm although that is in my llvm path. Additionally the compilation flags have an extra -isystem /usr/include that cause #include_next
Can you install latest ROCm release? It should have correct configuration for MIOpen, HIP and llvm.
@yxsamliu @Artem-B I'm working on gentoo ebuilds for 4.2.0. I got to the source of the problem and am dumping the info here incase anyone else runs into it as they are likely to also run across the wrapper include confusion.
MIOpen pulls flags for hip-clang from hip (or rocclr). It pulls these through flags defined in the cmake-targets of hip::device. In gentoo, the builds dont keep rocm libs / headers under a prefix of /usr/
- patch out the HSA include logic from hip-targets.cmake.in
- patch out the extra include infront of cuda_wrappers in hip_platform
*NOTE: the for cuda_wrappers headers installed with llvm-roc the openmp project targets must be set.
Hopefully this snippet helps someone else debugging issues that are similar.
I'm not at the bottom of this yet, i'm having more #include_next issues now. I'm kinda lost as to why an extra -isystem to /usr/include would make #include_next statements fail to find files that are present within /usr/include
See the details in the body of the bug report:
AFAICT, there's no reliable enough way to reliably control placement of cuda_wrappers in the search path from command line.
You may hack things into working, but I would not be surprised if your build would work slightly differently compared to the one which lets clang handle include search order.
E.g. if the wrappers happen to get included after some critical header because it's found earlier in the search path, you may end up with a different subset of the standard library available on GPU and that would be observable from C++.
The fact that you are struggling with the search path suggests that there's something odd going on with your system. Could you elaborate on why the special include paths are needed at all?
hsa headers for us go int /usr/include like everything else
This could be a problem as it might interfere with the inclusion order.
I would suggest using an approach used on some linux distributions for CUDA where the defaullt packages install CUDA headers in the same directory as the other packages. Basically, you create a fake monolithic rocm install directory populated with symlinks to the correct files/directories and then point clang there with --rocm-path
.
Did you see -internal-isystem ${llvm_prefix}/lib/clang/12.0.0/include/cuda_wrappers -internal-isystem ${rocm_prefix}/include -include __clang_hip_runtime_wrapper.h when you use -v ?
Normally clang detects ROCm installation path relative to itself and add these options, then hip_runtime.h should not trigger include_next since they should be disabled by https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.2.x/include/hip/amd_detail/hip_runtime.h#L373
Hmm, i got it to compile and work but, it seems to be doing great at everything pytorch can throw at it but I'm worried given what you both said something is wrong with the system paths. Here are the changes I ended up rolling with.
There's config in llvm-roc to add include/nvidia_wrappers to path already that was commented out in the base ebuild I was working with. I undid that comment so now it does have an isystem of include/cuda_wrappers. When you're asking what the output of -v is what tool are do mean using -v on?
It seems like I should change these ebuilds to go with the "standard" rocm layout and leave things in /usr/rocm with sym links elsewhere.
-v is an option for clang.
For HIP to work properly clang needs to be able to detect HIP installation. If detected, clang -v will have output like:
Found HIP installation: /opt/rocm-4.2.0, version 4.2.21155-37cb3a34
clang expects the relative location of itself, HIP runtime and ROCm device library follow the ROCm installation directory structure, i.e.
/opt/rocm/llvm => compiler /opt/rocm/amdgcn => device library /opt/rocm/hip => HIP runtime
As long as the relative location is correct, clang should be able to detect ROCm installation.
here's an output of that command
sr/lib/llvm/roc/bin/clang -v
clang version 12.0.0
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm/roc/bin
Selected GCC installation: /usr/lib/gcc/x86_64-pc-linux-gnu/10.3.0
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
Found HIP installation: /usr, version 3.5.0
the paths over here are /usr/lib/llvm/roc/bin/clang => llvm-roc compiler /usr/lib/hip/lib/bin/hip => HIP Runtime /usr/lib/amdgcn => device library
clang can detect HIP. That is good. However, the HIP runtime seems very old. The latest release is 4.2 Which branch of HIP are you using?
hipcc -v has trouble locating the right clang and rocm_agent_enumerator. Going to work on fixing that in the hip package but hipe does report after all the errors ,HIP version: 4.2.21301-. Gpu accelerated pytorch running the sample mnist cnn code ran over night. Thanks for all the help with figuring this stuff out. I'm going to work on changing things around to follow more along what is expected in with the rocm tools (separate rocm dir with sym links where needed)
This topic is not solved, right? Installing HIP in /usr/lib/x86_64-linux-gnu/libamdhip64.so
and /usr/include/hip
is still not supported in the upcoming ROCm 5.x? pinging @yxsamliu, @Artem-B, @fuag15, @cgmb.
Here on Debian testing (bookworm), with the debian official WIP packages installed in the above paths, got this (and compiling a compute kernel, clang fails to find math.h):
maxzor@ws:~$ clang -v -x c++ /dev/null -fsyntax-only
Debian clang version 13.0.1-+rc3-1~exp1+b1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/11
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/11
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found HIP installation: /usr, version 4.4.22062
(in-process)
"/usr/lib/llvm-13/bin/clang" -cc1 -triple x86_64-pc-linux-gnu -fsyntax-only -disable-free -disable-llvm-verifier -discard-value-
names -main-file-name null -mrelocation-model static -mframe-pointer=all -fmath-errno -fno-rounding-math -mconstructor-
aliases -munwind-tables -target-cpu x86-64 -tune-cpu generic -debugger-tuning=gdb -v -fcoverage-compilation-dir=/home
/maxzor -resource-dir /usr/lib/llvm-13/lib/clang/13.0.1 -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include
/c++/11 -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 -internal-isystem
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward -internal-isystem /usr/lib/llvm-13/lib/clang/13.0.1
/include -internal-isystem /usr/local/include -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-
gnu/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem
/usr/include -fdeprecated-macro -fdebug-compilation-dir=/home/maxzor -ferror-limit 19 -fgnuc-version=4.2.1 -fcxx-exceptions
-fexceptions -fcolor-diagnostics -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -x c++ /dev/null
clang -cc1 version 13.0.1 based upon LLVM 13.0.1 default target x86_64-pc-linux-gnu
ignoring nonexistent directory "/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward
/usr/lib/llvm-13/lib/clang/13.0.1/include
/usr/local/include
/usr/include/x86_64-linux-gnu
/usr/include
End of search list.
@Maxzor, I don't think it was fixed in 5.0 but you'll be able to check for yourself soon. Unfortunately, I've been tied up with (among other things) reviewing similar changes for the math libraries. The need for backwards-compatibility makes them non-trivial.
@Artem-B, Sorry for the lack of response. Please try latest ROCm 6.0.2 (HIP 6.0.32831) to see if your issue still exists? If resolved, please close the ticket. Thanks.