cutlass issues

[FEA] Would group conv be supported in cutlass future release?

6

As group convolution is one important operator(eg. in ResNext: https://arxiv.org/pdf/1611.05431.pdf) in CV models, is there any plan to support it in future release? Thanks a lot!

ginowu

feature request

[QST] A100 double-precision Tensor Cores ?

4

Hi, is there any support for DP Tensor Cores in cutlass, available or foreseen ? Thanks in advance

touisteur

question

inactive-30d

[QST] Datatype conversion for tensor on GPU

3

I'm trying to convert the data type of a tensor on GPU, I think this should be faster than on CPU. Also I'll need to do it several time between...

nolyn

question

inactive-30d

[QST] How slice K reduce the value?

8

Hi! I am learning 'tall' matmul and find it **hard to find the code** describing how slice K reduce the value.... I think, each wrap will calculate 32*64 values (each...

Arsmart123

question

[BUG] CMake transitive target doesn't appear to work

4

**Describe the bug** After using `add_subdirectory` on CUTLASS in a CMake project, I'd expect CUTLASS's transitive target(s) to be exported for use in other projects. I have tried the following...

cliffburdick

bug

[FEA] LinearCombinationSilu epilogue

14

I modified the epilogue function in Example 17 from LinearCombinationRelu to LinearCombinationSilu, like this: using EpilogueOp = cutlass::epilogue::thread::LinearCombinationSilu< ElementOutput, // Data type of output matrix. 128 / cutlass::sizeof_bits::value, // The...

pianogGG

feature request

inactive-30d

[DOC]Require detailed description of `xxxThreadMap` and `xxxTileIterator`

2

Hi, NVIDIA team! Thank you for your awesome work! I am a newbie to this area and I have been trying to learn this framework for months and still made...

zhanggefan

documentation

inactive-30d

[DOC] Where does cutlass’ detailed GEMM kernel?

3

Hi! I am learning cutlass, and I see something like: (from official post) ```C++ /// CUTLASS SGEMM example __global__ void gemm_kernel(void gemm_kernel( float *C, float *C, float const *A, float...

Arsmart123

question

inactive-30d

[QST] Running CUTLASS kernels from device global functions

3

Hi! I'm pretty new to CUTLASS (and CUDA, to be honest). I have a two-fold question: 1) I'm trying to apply dynamic parallelism with the launch of cutlass::gemm::device::Gemm under hood....

cydoroga

question

[QST] How many threads and blocks does cutlass use? (When C is tall in official post)

4

Hi! I am learning cutlass. And I read this post: [CUTLASS: Fast Linear Algebra in CUDA C++ | NVIDIA Technical Blog](https://developer.nvidia.com/blog/cutlass-linear-algebra-cuda/) But I can not find official “dispatch_policies.h”, only find...

Arsmart123

question

inactive-30d

cutlass
cutlass copied to clipboard

Metadata

[FEA] Would group conv be supported in cutlass future release?

[QST] A100 double-precision Tensor Cores ?

[QST] Datatype conversion for tensor on GPU

[QST] How slice K reduce the value?

[BUG] CMake transitive target doesn't appear to work

[FEA] LinearCombinationSilu epilogue

[DOC]Require detailed description of `xxxThreadMap` and `xxxTileIterator`

[DOC] Where does cutlass’ detailed GEMM kernel?

[QST] Running CUTLASS kernels from device global functions

[QST] How many threads and blocks does cutlass use? (When C is tall in official post)

← Metadata

Owner

Metadata

cutlass cutlass copied to clipboard

Metadata

← Metadata

Owner

Metadata

cutlass
cutlass copied to clipboard