HIP icon indicating copy to clipboard operation
HIP copied to clipboard

[Issue]: HIPCC compiler doesn't respect -I include order

Open elsampsa opened this issue 1 year ago • 3 comments

Problem Description

ROCm ships with its own versions of Thrust, OpenMP, llvm compiler etc.

These are installed under /opt/rocm that links to /opt/rocm-version. Header files for aforementioned libraries are in /opt/rocm/include.

Problem is, that the system, be it bare-bones linux box, conda environment or even a docker image, can already have several of these libraries installed system-wide.

This should be no problem. It is very common to have different version of libraries in the same system and then you just tell the compiler and linker, which ones to use.

For example, let's suppose I have an anaconda environment at

/home/sampsa/anaconda3/envs/torch_rocm/

Then at compile stage I should be able to do this:

-I/opt/rocm/include -I/home/sampsa/anaconda3/envs/torch_rocm/include

and it would first search for trust, openmp, whatever include files from /opt/rocm/include and then for some remaining vanilla/system include files from /home/sampsa/anaconda3/envs/torch_rocm/include.

However, when running the wrapper script that we are supposed to use for compilation

/opt/rocm/llvm/bin/clang++

and giving it -I arguments, it just randomizes the include order..!

My hunch is that the wrapper includes lot of -I and -isystem switches which then drive the actual clang compiler crazy and turns the compiler into a random-include-order generator.

This is some real PITA. It makes compilation of ROCm stuff (say pytorch) almost impossible as you must be very carefull in your include order: you want to use the ROCm provided header files on one hand and the system header files on the other.

Operating System

Ubuntu 22.04

CPU

(not relevant)

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.2.0

ROCm Component

HIP

Steps to Reproduce

Here is a script that demonstrates the problem. You can run it stand-alone. The only requierement is that you have installed ROCm:

#!/bin/bash
echo
echo CLANG CPP COMPILER
echo
touch paska.cpp
touch paska.hip
/opt/rocm/llvm/bin/clang++ -v \
    paska.cpp -o paska \
    -I/opt/rocm/include \
    -I/home/sampsa/anaconda3/envs/torch_rocm/include 2>&1 \
    | sed -n '/search starts here:/,/End of search list./p' | grep -v 'search starts here:\|End of search list.'
## --> include path search order:
## /opt/rocm/include
## /home/sampsa/anaconda3/envs/torch_rocm/include
## --> SEARCH PATH ORDER IS RESPECTED
echo
echo CLANG HIP COMPILER
echo
/opt/rocm/llvm/bin/clang++ -v \
    paska.hip -o paska \
    -I/opt/rocm/include \
    -I/home/sampsa/anaconda3/envs/torch_rocm/include 2>&1 \
    | sed -n '/search starts here:/,/End of search list./p' | grep -v 'search starts here:\|End of search list.'
## --> include path search order:
## /home/sampsa/anaconda3/envs/torch_rocm/include
## /opt/rocm-6.2.0/lib/llvm/lib/clang/18/include/cuda_wrappers
## ...
## /opt/rocm/include
## --> SEARCH PATH ORDER IS **NOT** RESPECTED
echo
echo CLANG HIP COMPILER WITHOUT -I/opt/rocm/include
echo
/opt/rocm/llvm/bin/clang++ -v \
    paska.hip -o paska \
    -I/home/sampsa/anaconda3/envs/torch_rocm/include 2>&1 \
    | sed -n '/search starts here:/,/End of search list./p' | grep -v 'search starts here:\|End of search list.'
## --> include path search order:
## /home/sampsa/anaconda3/envs/torch_rocm/include
## /opt/rocm-6.2.0/lib/llvm/lib/clang/18/include/cuda_wrappers
## ...
## /opt/rocm/include
## --> /opt/rocm/include is included by rocm's clang++ rocm wrapper, but..
## SEARCH PATH ORDER IS **NOT** RESPECTED
echo
echo CLANG HIP COMPILER WITH -I/opt/rocm-6.2.0
echo
/opt/rocm/llvm/bin/clang++ -v \
    paska.hip -o paska \
    -I/opt/rocm-6.2.0 \
    -I/home/sampsa/anaconda3/envs/torch_rocm/include 2>&1 \
    | sed -n '/search starts here:/,/End of search list./p' | grep -v 'search starts here:\|End of search list.'
## --> include path search order:
## /opt/rocm-6.2.0
## /home/sampsa/anaconda3/envs/torch_rocm/include
## --> SEARCH PATH ORDER IS (AGAIN) RESPECTED

In the last test we're using -I/opt/rocm-6.2.0 and it seems to work. However, when the compilation commands get very complicated and long (like when compiling pytorch with cmake), even the -I/opt/rocm-6.2.0 (which might be repeated many times), goes again last in the preference list (I can provide example commands if you're interested in the issue..)

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

elsampsa avatar Sep 03 '24 14:09 elsampsa

A small comment/additional context is in place:

The only way to actually compile ROCm-related software seems to be with docker.

That is because it is the only way to create an isolated and pristine system without any conflicting libraries with the ROCm installation that goes into /opt/rocm.

For example, in my system I have installed prior to ROCm /usr/include/thrust (probably installed by ubuntu's default distro cuda.. but they also come with the mkl-include python package).

But ROCm comes with /opt/rocm/include/thrust/.

Then, during a build (tried building pytorch), the build system starts using /usr/include/thrust instead of the ROCm-bundled /opt/rocm/include/thrust/

This is also applicable to conda environments, where the conflict comes from /home/sampsa/anaconda3/envs/torch_rocm/include/thrust (actually installed by the required mkl-include python package).

This could be easily fixed with -I arguments to the hipcc compiler if it would be possible (as explained in the ticket).

I know that with nvidia I can compile stuff easily without docker. :)

elsampsa avatar Sep 04 '24 07:09 elsampsa

Hi @elsampsa, internal ticket has been created to investigate your issue. Thanks!

ppanchad-amd avatar Sep 27 '24 18:09 ppanchad-amd

Hi @elsampsa , there is a bit of hidden complexity inside Clang (the compiler itself, not a wrapper! Our wrapper is "hipcc".) that causes some of your ROCm includes to not be respected.

My hunch is that the wrapper includes lot of -I and -isystem switches which then drive the actual clang compiler crazy and turns the compiler into a random-include-order generator.

This is a good hunch! The source of the include overrides is inside the clang source code, where a number of ROCm includes (rocm/include, rocm/include/thrust, rocm/include/rocprim) are inserted via -idirafter when compiling a HIP program. The -idirafter flag overrides -I, similarly to how -isystem does. This means that you will not be able to get the rocm/include(s) to the front of the include search path manually. The compiler will only insert these overrides when compiling a HIP program, which it recognizes if the source file ends in .hip, or you pass -xhip to it (see my reply to your other issue about this.)

The good news is that there aren't that many overrides. The ones that are there exist for complicated reasons - see this commit and this other one as examples.

If you want to prioritize ROCm libraries and fill in the gaps with your own libraries, instead of promoting rocm/include by using -I/opt/rocm/include which will be overriden by clang, you can demote your own includes using -idirafter/path/to/my/includes. These will appear after the ROCm includes in the include search path.

jamesxu2 avatar Oct 11 '24 13:10 jamesxu2

@elsampsa I'm closing this issue due to inactivity. You can reopen it or submit a new issue if you have any followups.

jamesxu2 avatar Oct 25 '24 15:10 jamesxu2