cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

CUDA Templates for Linear Algebra Subroutines

Results 608 cutlass issues
Sort by recently updated
recently updated
newest added

Hi! I have a Batched Matrix Multiply problem with no fixed stride between batches. The minimalist example is the following (all the matrices are RowMajor): I want to calculate $O...

question

**Describe the bug** Fused GEMM example gives the wrong result for some values of `problemSize1.K`. **Steps/Code to reproduce bug** Set the following problem sizes in `examples/13_two_tensor_op_fusion/fused_two_gemms_f16_sm80_shmem.cu` ```c++ cutlass::gemm::GemmCoord gemm_f16_sm80_problem_size_0(128*640, 48,...

bug
? - Needs Triage

I run TF32 gemm example, set different stages(1 of 4) has different accurate. why?

question
inactive-30d

**Describe the bug** CUTLASS and EGL header file conflict, if you include EGL header file (#include ) before including CUTLASS header file, a compilation error will occur, which can be...

bug
? - Needs Triage
inactive-30d

Is `b1 x b1` GEMMs all implemented by XOR that requires `uint1_t x uint1_t` ? What if `A=uint1_t` and `B=int1_t` ? (e.g. A is ReLU output, B is weight) Thanks...

question
inactive-30d

**Is your feature request related to a problem? Please describe.** When using the -conv-fprop of cutlass to perform the conv operation, it is found that in the entire kernel, the...

feature request
inactive-30d

Hi! I have written a code for slicedK in GEMM, but it seems very slow....I tried to understand cutlass's slicedK, but can not understand it....So I post my code here...

question

fixed bugs and update verification logics. * removed verification for `Max`, making the verification logic more consistent: we don't check `Sum`, then we won't check `Max`. * fixed the correctness...

Add residual support for shmem staging iterator used in back-to-back GEMM fusion. This allows support of problem_size_0_n that is not multiple of 32. @danthe3rd , would you please give it...

I want to implement BN layer as an epilogue with cutlass, which requires both division and plus operations. I want to know is there a way to implement something like...

question