Bryce Allen
Bryce Allen
SYCL has a 2k parameter size limit. CUDA typically has 4k, although it may depend on device. Complex RHS expressions can exceed this limit pretty easily, particularly on SYCL, and...
Adding a debug print to gtensor_storage copy ctor, shows some surprises. In particular assigning a temporary to an auto variable can result in a copy.
Suggested by @td-mpcdf. My understanding is that this would just be an offset, so if lbound=9, the range [10,20] would actually be [1,11] on the physical array. Basically lbound is...
Currently the main `CMakeLists.txt` file lists libraries needed for blas and fft for both HIP and SYCL. This is just a quick hack to get things working until upstream cmake...
This can be implemented via thrust::transform for CUDA/HIP and via std::transform for Intel SYCL
If the source has a const data type, no signature will match and compile will fail, e.g. ``` error: no instance of overloaded function "gt::copy" matches the argument list argument...
Follow convention of xtensor, i.e. add another template param for `gt::layout_type::row_major`, `gt::layout_type::column_major`. For backward compatibility we can make the default column_major, but since it's a C++ lib, maybe we should...
When profiling and debugging, tracking down which version of a gtensor kernel is the one of interest can be challenging. For SYCL, kernel names are passed as template parameters with...
the reference types are currently defined to be value_type
Since NDEBUG affects other code, we should have a define that is specific to gtensor for disabling asserts. This is important especially for HIP and SYCL, which don't allow assert...