SYCLomatic
SYCLomatic copied to clipboard
Codes not migrated
Please see the migration result. dpct version 17.0.0. Codebase:(426a0560b12914f001e8980152d9447a28ed7a4b)
Thanks.
input file: test.cuh
#include <cuda.h>
template <typename T>
inline __device__ T shfl(const T val, int srcLane, int width = 32) {
#if CUDA_VERSION >= 9000
return __shfl_sync(0xffffffff, val, srcLane, width);
#else
return __shfl(val, srcLane, width);
#endif
}
template <typename T>
inline __device__ T* shfl(T* const val, int srcLane, int width = 32) {
static_assert(sizeof(T*) == sizeof(long long), "pointer size");
long long v = (long long)val;
return (T*)shfl(v, srcLane, width);
}
output:
#include <sycl/sycl.hpp>
#include <dpct/dpct.hpp>
template <typename T>
inline T shfl(const T val, int srcLane, int width = 32) {
#if CUDA_VERSION >= 9000
return __shfl_sync(0xffffffff, val, srcLane, width);
#else
return __shfl(val, srcLane, width);
#endif
}
template <typename T>
inline T* shfl(T* const val, int srcLane, int width = 32) {
static_assert(sizeof(T*) == sizeof(long long), "pointer size");
long long v = (long long)val;
return (T*)shfl(v, srcLane, width);
}
I added a CUDA example (https://github.com/zjin-lcf/HeCBench/tree/master/warpselect-cuda) for your reference. Then, I changed the generated DPCT codes manually for the shuffle functions and the copy constructor of the Tensor class. However, the remaining SYCL compiler errors are:
static assertion failed due to requirement 'is_device_copyable<const faiss::gpu::Tensor<float, 2, true, int, faiss::gpu::traits::DefaultPtrTraits>, void>::value || detail::IsDeprecatedDeviceCopyable<const faiss::gpu::Tensor<float, 2, true, int, faiss::gpu::traits::DefaultPtrTraits>, void>::value': The specified type is not device copyable
static_assert(is_device_copyable<FieldT>::value ||
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The tensor class is defined in Tensor.cuh (Tensor.dp.hpp) and Tensor-inl.cuh (Tensor.in.dp.hpp). My understanding is that a Tensor object is not copyable in SYCL while it is copyable in the CUDA program. Thank you for your suggestions.
@jinz2014
#1for the first reproducer test.cuh, it has been reproduced, the root cause it that the template function shfl is not instantiated, the function call to _shfl_sync/_shfl is UnresolvedLookupExpr, therefore, not processed by SYCLomatic.
The WA is to adding caller function to instantiate template function shfl.
#2 For the second reproducer "https://github.com/zjin-lcf/HeCBench/tree/master/warpselect-cuda", as https://github.com/zjin-lcf/HeCBench/blob/8514c8b89a547ceb7b55eb54888dc5325c6078d8/warpselect-cuda/Tensor.cuh#L95C5-L95C11, https://github.com/zjin-lcf/HeCBench/blob/8514c8b89a547ceb7b55eb54888dc5325c6078d8/warpselect-cuda/Tensor.cuh#L95C5-L95C11, https://github.com/zjin-lcf/HeCBench/blob/8514c8b89a547ceb7b55eb54888dc5325c6078d8/warpselect-cuda/Tensor.cuh#L107, https://github.com/zjin-lcf/HeCBench/blob/8514c8b89a547ceb7b55eb54888dc5325c6078d8/warpselect-cuda/Tensor.cuh#L103 breaks C++ trivially copyable requirements ( https://en.cppreference.com/w/cpp/types/is_trivially_copyable), and SYCL spec does not allow to pass as kernel argument.
WA for you is to change the origin cuda code, then migrate it again.
Have you solved it? I have the same problem
@lwqq00 I reopened the issue.
@lwqq00 I would try to change the cuda code
@tomflinda
Users may or may not change CUDA codes. On the other hand, may you please suggest code changes needed in the SYCL program to work around the trivially copyable issue ? I assume that developers of the Intel extension for Pytorch encountered the same issue before. Thanks.