cutlass issues

[QST] GemmArray with overlapping output chunks.

5

Hi! I have a Batched Matrix Multiply problem with no fixed stride between batches. The minimalist example is the following (all the matrices are RowMajor): I want to calculate $O...

cydoroga

question

[BUG] Fused GEMM example gives wrong result with some shapes

7

**Describe the bug** Fused GEMM example gives the wrong result for some values of `problemSize1.K`. **Steps/Code to reproduce bug** Set the following problem sizes in `examples/13_two_tensor_op_fusion/fused_two_gemms_f16_sm80_shmem.cu` ```c++ cutlass::gemm::GemmCoord gemm_f16_sm80_problem_size_0(128*640, 48,...

danthe3rd

bug

? - Needs Triage

[QST] set different stages has different accuracy

8

I run TF32 gemm example, set different stages(1 of 4) has different accurate. why?

yuxgis

question

inactive-30d

[BUG] CUTLASS conflict with EGL header file

1

**Describe the bug** CUTLASS and EGL header file conflict, if you include EGL header file （#include ） before including CUTLASS header file, a compilation error will occur, which can be...

zongfeijing

bug

? - Needs Triage

inactive-30d

[QST] Any support or examples of uint1_t x int1_t GEMM?

5

Is `b1 x b1` GEMMs all implemented by XOR that requires `uint1_t x uint1_t` ? What if `A=uint1_t` and `B=int1_t` ? (e.g. A is ReLU output, B is weight) Thanks...

Akimoto-Cris

question

inactive-30d

[RFE] Optimize the conv-fprop operator

12

**Is your feature request related to a problem? Please describe.** When using the -conv-fprop of cutlass to perform the conv operation, it is found that in the entire kernel, the...

lixiaolx

feature request

inactive-30d

[QST] How to use slicedK in GEMM?

3

Hi! I have written a code for slicedK in GEMM, but it seems very slow....I tried to understand cutlass's slicedK, but can not understand it....So I post my code here...

Arsmart123

question

support for alignment != 8 and adding a new BMM example

11

fixed bugs and update verification logics. * removed verification for `Max`, making the verification logic more consistent: we don't check `Sum`, then we won't check `Max`. * fixed the correctness...

yzhaiustc

b2b gemm residual

1

Add residual support for shmem staging iterator used in back-to-back GEMM fusion. This allows support of problem_size_0_n that is not multiple of 32. @danthe3rd , would you please give it...

hwu36

[QST] BatchNorm with cutlass

34

I want to implement BN layer as an epilogue with cutlass, which requires both division and plus operations. I want to know is there a way to implement something like...

Exusial

question

cutlass
cutlass copied to clipboard

Metadata

[QST] GemmArray with overlapping output chunks.

[BUG] Fused GEMM example gives wrong result with some shapes

[QST] set different stages has different accuracy

[BUG] CUTLASS conflict with EGL header file

[QST] Any support or examples of uint1_t x int1_t GEMM?

[RFE] Optimize the conv-fprop operator

[QST] How to use slicedK in GEMM?

support for alignment != 8 and adding a new BMM example

b2b gemm residual

[QST] BatchNorm with cutlass

← Metadata

Owner

Metadata

cutlass cutlass copied to clipboard

Metadata

← Metadata

Owner

Metadata

cutlass
cutlass copied to clipboard