cutlass
cutlass copied to clipboard
CUDA Templates for Linear Algebra Subroutines
As group convolution is one important operator(eg. in ResNext: https://arxiv.org/pdf/1611.05431.pdf) in CV models, is there any plan to support it in future release? Thanks a lot!
Hi, is there any support for DP Tensor Cores in cutlass, available or foreseen ? Thanks in advance
I'm trying to convert the data type of a tensor on GPU, I think this should be faster than on CPU. Also I'll need to do it several time between...
Hi! I am learning 'tall' matmul and find it **hard to find the code** describing how slice K reduce the value.... I think, each wrap will calculate 32*64 values (each...
**Describe the bug** After using `add_subdirectory` on CUTLASS in a CMake project, I'd expect CUTLASS's transitive target(s) to be exported for use in other projects. I have tried the following...
I modified the epilogue function in Example 17 from LinearCombinationRelu to LinearCombinationSilu, like this: using EpilogueOp = cutlass::epilogue::thread::LinearCombinationSilu< ElementOutput, // Data type of output matrix. 128 / cutlass::sizeof_bits::value, // The...
Hi, NVIDIA team! Thank you for your awesome work! I am a newbie to this area and I have been trying to learn this framework for months and still made...
Hi! I am learning cutlass, and I see something like: (from official post) ```C++ /// CUTLASS SGEMM example __global__ void gemm_kernel(void gemm_kernel( float *C, float *C, float const *A, float...
Hi! I'm pretty new to CUTLASS (and CUDA, to be honest). I have a two-fold question: 1) I'm trying to apply dynamic parallelism with the launch of cutlass::gemm::device::Gemm under hood....
Hi! I am learning cutlass. And I read this post: [CUTLASS: Fast Linear Algebra in CUDA C++ | NVIDIA Technical Blog](https://developer.nvidia.com/blog/cutlass-linear-algebra-cuda/) But I can not find official “dispatch_policies.h”, only find...