llvm
llvm copied to clipboard
[SYCL] Make invoke_simd convert its arguments to appropriate type.
-
Introduce an intermediate lambda in invoke_simd and call SIMD target from the lambda with given (SIMD) arguments coming from the lambda's formal parameters. This way compiler automatically performs necessary argument type conversion.
-
Introduce a new implicit constructor for simd to allow conversion of simd objects with _VecExt storage kind (used in invoke_simd extension).
-
Fix linkonce_odr linkage of entry points : change it to external linkage to avoid removal by the inliner and llvm-linker.
This new scheme complicates the SIMD target function pointer flow, so LowerInvokeSimd.cpp update is needed to accommodate the new flow.
E2E test is underway.
Signed-off-by: Konstantin S Bobrovsky [email protected]
Folks, I would appreciate if could you review the following parts:
@rolandschulz: sycl/include/std/experimental/simd.hpp sycl/include/sycl/ext/oneapi/experimental/invoke_simd.hpp sycl/test/invoke_simd/invoke_simd.cpp
@v-klochkov: llvm/lib/SYCLLowerIR/LowerInvokeSimd.cpp
@asudarsa: llvm/tools/sycl-post-link/ModuleSplitter.cpp llvm/tools/sycl-post-link/ModuleSplitter.h
The llvm/test/SYCLLowerIR/ESIMD/lower_invoke_simd.ll failure will be fixed by test update (hopefully).
They will only remove them if unused. Why do they incorrectly assume they are unused? Is it because you spit into SPMD and SIMD and the entry point into SIMD is unused inside the SIMD part? And therefore it gets removed after splitting?
No, it is not because of splitting. They (linker and inliner) correctly assume they are unused, because there can remain no calls to entry points within the module. In this case inliner turns this
invoke_simd(Helper,...)
Helper(...) {
call UserF
}
UserF(...) { code; } // <-- declared as entry point in the user code (SYCL_EXTERNAL)
into
invoke_simd(Helper,...)
Helper(...) {
code;
}
which is correct from the LLVMIR optimization standpoint, but not correct conceptually because UserF is declared as entry point and someone might want to link to it / call it later. Ideally, SYCL_EXTERNAL must not lead to linkonce* linkage, even for template instantiations (or linkonce* must be interpreted differently by inliner/linker, but this might be against its definition).
More practical example would be a template SYCL_EXTERNAL function.
I'm pretty sure the intention of SYCL_EXTERNAL (as defined by SYCL spec and therefore not in the context of invoke_simd) was never to give a function external linkage if it didn't have it according to the C++ spec. Only to guarantee that a function which has external linkage also has external linkage for device linking. An implicit instantiated template function doesn't have external linkage in C++. You need to explicit instantiate it. And if you do that the LLVM-IR linkage is weak not linkonce. Which guarantees it works.
Ideally, SYCL_EXTERNAL must not lead to linkonce* linkage, even for template instantiations Therefore I disagree with that.
For the case of invoke_simd why is it possible that the inliner gets rid of the call. My understanding is that we don't support inlining through invoke_simd (yet).
'm pretty sure the intention of SYCL_EXTERNAL (as defined by SYCL spec and therefore not in the context of invoke_simd) was never to give a function external linkage if it didn't have it according to the C++ spec. Only to guarantee that a function which has external linkage also has external linkage for device linking.
Hm. This excerpt from the spec + details in (5.10.1.) pretty much means SYCL_EXTERNAL = external linkage for me:
Any variable or function that is odr-used from a device function must be defined in the same translation unit as that use. However, a function may be defined in another translation unit if the implementation defines the SYCL_EXTERNAL macro as described in Section 5.10.1.
But WeakODR might be better in this case of implicit instantiation indeed. I changed the code.
We discussed the linkonce_odr/SYCL_EXTERNAL problem with @rolandschulz and others in more details. Key takeaways:
SYCL_EXTERNALdoes not affect function linkage as defined by C++ ruleslinkonce_odrcan beSYCL_EXTERNAL. But if no uses after optimizations - compiler is free to remove it.- To avoid removal,
weak_odrcan replacelinkonce_odrw/o affecting correctness.
In this patch I do the linkage replacement for both helper and target. For helper this is permanent change, for target - temporary. Just to protect the target from removal when ModuleDesc::cleanup() is called on split ESIMD module, because there can be other uses of the target in the SYCL part of the program. Once ESIMD and SYCL are linked back together (this is what invoke_simd requires), cleanup is run on the linked module.
Original Pseudo-IR illustrating helper and target:
...
__builtin_invoke_simd(helper, target,...);
...
ret helper(callable f, args...) {
f(args...);
}
Optimized to:
...
__builtin_invoke_simd(helper, ...);
...
ret helper(args...) {
target(args...);
}
but there can be still other uses of target. Otherwise it gets removed.
The only failure is InvokeSimd/invoke_simd_smoke.cpp, which is fixed by https://github.com/intel/llvm-test-suite/pull/1146
/verify with https://github.com/intel/llvm-test-suite/pull/1146
@asudarsa, @cperkinsintel - I resolved outstanding comments with @rolandschulz and @v-klochkov. Please take a look.
Only nitpicks. LGTM.
@asudarsa, thanks - let me address your NIT comments in a separate PR to reduce the risk of conflicts for this big PR.
Had to fix test failure (see 9dd341e), so I addressed @asudarsa's review comments as well (b4b6749). @rolandschulz, @v-klochkov, @asudarsa - please re-approve.
/verify with https://github.com/intel/llvm-test-suite/pull/1146
-
Expected failure:
- SYCL / Linux / L0 GEN9 LLVM Test Suite (pull_request_target):
- SYCL / Linux / OCL GEN9 LLVM Test Suite (pull_request_target):
- InvokeSimd/invoke_simd_smoke.cpp, which is fixed by https://github.com/intel/llvm-test-suite/pull/1146
-
Unrelated failures:
- SYCL / Linux / HIP AMDGPU LLVM Test Suite (pull_request_target) :
- SYCL / Linux / CUDA LLVM Test Suite (pull_request_target):
- SYCL :: DeviceLib/imf_simd_emulate_test.cpp
@cperkinsintel, please review/approve for @intel/llvm-reviewers-runtime
@cperkinsintel, please review/approve for @intel/llvm-reviewers-runtime
Actually, I have approval from @intel/llvm-reviewers-runtime - from @v-klochkov. Merging.