cutlass issues

[QST] Are there plans to add specialisations for Sm90?

7

**What is your question?** I recently tried to change the type tags on the [DGEMM examples](https://github.com/NVIDIA/cutlass/blob/main/examples/45_dual_gemm/dual_gemm.cu) to ```cutlass::arch::Sm90```, which caused a load of compile errors. This is primarily because there's...

joerowell

help wanted

question

inactive-30d

inactive-90d

Add couple configs into generator.py for mixed input MM

8

I'm adding (PR [here](https://github.com/pytorch/pytorch/pull/119986)) CUTLASS kernels as an auto-tune option for PyTorch compiler, and it would be nice to have these additional configurations available. This is not urgent, and more...

alexsamardzic

inactive-30d

[QST] Question on customize epilogue reduction

5

**What is your question?** Hello, I found that many epilogues are element-wise. I wondered if it could be customized to sum up a `2*2` tile instead of an element-wise operation....

zejia-lin

question

inactive-30d

[BUG] Stride is ignored for dst tensor of a Conv2dFprop

40

I have implemented a basic sample code to convolve a 2D image with a row filter. It works, but when the dst image has some stride, it seems ignored by...

chacha21

bug

[QST] Is s8 * s8 = {s32, s8} supported in cuTe?

2

Is s8 * s8 = {s32, s8} supported in cuTe?

MingZwhy

question

? - Needs Triage

[QST] Sparse GEMM runs much worse than Dense GEMM in some cases

17

I am benchmarking sparse and dense GEMMs through the cutlass profiler. I am seeing that sparse GEMMs run **slower** than dense GEMMs in the same scenario. For example, compare the...

jimwu6

question

inactive-30d

inactive-90d

Adding Cublas as a provider for cutlass profiler

Cutlass profiler has a great set of flags to perform shmoos across different matrix shapes and sizes. While benchmarking GEMMs using the cutlass profiler, one can use Cublas as a...

ashish007git

Add int4b_t/uint4b_t support for mixed dtypes GEMM

19

@manishucsd @rhenry-nv

alexsamardzic

inactive-30d

Add support for dynamic offsets to DefaultEpilogue

13

Dynamic offsets in `DefaultEpilogue` allows to move pointer arithmetics to device and shift `C` and `D` pointers based on offsets stored in device memory. Depends on https://github.com/NVIDIA/cutlass/pull/1273

ezhulenev

inactive-30d

inactive-90d

Make runtime assert more clear on CUDA

2

As stands, when a runtime assert is called on CUDA platforms your program just explodes with no stack trace and no mention of the error that was encountered. I just...

sophiawisdom

inactive-30d

inactive-90d

cutlass
cutlass copied to clipboard

Metadata

[QST] Are there plans to add specialisations for Sm90?

Add couple configs into generator.py for mixed input MM

[QST] Question on customize epilogue reduction

[BUG] Stride is ignored for dst tensor of a Conv2dFprop

[QST] Is s8 * s8 = {s32, s8} supported in cuTe?

[QST] Sparse GEMM runs much worse than Dense GEMM in some cases

Adding Cublas as a provider for cutlass profiler

Add int4b_t/uint4b_t support for mixed dtypes GEMM

Add support for dynamic offsets to DefaultEpilogue

Make runtime assert more clear on CUDA

← Metadata

Owner

Metadata

cutlass cutlass copied to clipboard

Metadata

← Metadata

Owner

Metadata

cutlass
cutlass copied to clipboard