SSE2 Intrinsics cause compiler error
I've got some simple SSE2 code here.
#include <immintrin.h>
int main(){
// Uncomment the "#ifdef" to fix the issue
//#ifdef __HCC_CPU__
__m128i arb = _mm_undefined_si128();
__m128i zero = _mm_xor_si128(arb, arb);
return _mm_extract_epi64(zero, 0);
//#endif
}
I compile as:
hcc `hcc-config --cxxflags --ldflags` -msse2 example.cpp -o example
And it outputs the error:
example.cpp:4:19: error: always_inline function '_mm_undefined_si128' requires
target feature 'sse2', but would be inlined into function 'main' that is
compiled without support for 'sse2'
__m128i arb = _mm_undefined_si128();
^
example.cpp:5:20: error: always_inline function '_mm_xor_si128' requires target
feature 'sse2', but would be inlined into function 'main' that is compiled
without support for 'sse2'
__m128i zero = _mm_xor_si128(arb, arb);
^
example.cpp:7:12: error: '__builtin_ia32_vec_ext_v2di' needs target feature sse2
return _mm_extract_epi64(zero, 0);
^
/opt/rocm/hcc/lib/clang/8.0.0/include/smmintrin.h:1097:14: note: expanded from
macro '_mm_extract_epi64'
(long long)__builtin_ia32_vec_ext_v2di((__v2di)(__m128i)(X), (int)(N))
I'm able to get the compiler to run with #ifdef __HCC_CPU__ , but I doubt that is what is intended. My hcc version is as follows:
hcc --version
HCC clang version 8.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 6ec3c61e09fbb60373eaf5a40021eb862363ba2c) (ssh://gerritgit/lightning/ec/llvm ab3b88ffc2ae50f55361a49aec89f6e95d9d0ec4) (based on HCC 1.3.18482-757fb49-6ec3c61-ab3b88f )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/bin
This is actually expected behaviour. Those builtins are only available for intel targets. HCC is a single-source CPU/GPU compiler. So there are two passes; one for CPU (host) and one for GPU (device) targets. On the second pass, the target will be amdgcn, where these builtins are not supported.
This is actually expected behaviour. Those builtins are only available for intel targets. HCC is a single-source CPU/GPU compiler. So there are two passes; one for CPU (host) and one for GPU (device) targets. On the second pass, the target will be amdgcn, where these builtins are not supported.
Thanks for getting back to me on this.
I guess what I expected instead, was for only [[HC]] labeled functions to be compiled in the 2nd pass. I would expect that only a minority of code would be for GPUs (the minority which is called by other [[HC]] functions).
I guess, when I use HCC intrinsics or inline-assembly in a [[hc]] function, there's no compiler error. So I was hoping that x86 intrinsics could be used in a similar manner.
I too am getting this error when compiling sse2 intrinsic. What is the solution when using hcc?