cutlass issues

[doc] Update flags for CUDA 13

1

`-I/usr/local/cuda/include/cccl` in cuda 13 (related #2543) remove duplicated `--cuda-gpu-arch` `_LIBCUDACXX_STD_VER` is deprecated and not used by the project

chengscott

inactive-30d

Latex Printing Support

1

Ports over some of the latex printing functionality from C++ and adds an example.

depaulmillz

inactive-30d

[BUG] CuTe DSL doesn't support with_shape?

3

### Which component has the problem? CuTe DSL ### Bug Report **Describe the bug** **Steps/Code to reproduce bug** ``` import cutlass.cute as cute @cute.jit def test(): layoutA = cute.make_layout((4, 4),...

Edenzzzz

bug

? - Needs Triage

inactive-30d

CuTe DSL

[QST]Legal threadblock, warp, mma shape

2

**What is your question?** I am trying to use cutlass on Ampere architecture to multiply two rectangular matrix MxK and KxN where M and N are small (say 16) and...

mbernaschi

question

? - Needs Triage

inactive-30d

[BUG] Dynamic versioning fails; wheel built as 0.0.0 for CuTe DSL

1

### Which component has the problem? CuTe DSL ### Bug Report Building nvidia-cutlass-dsl with dynamic versioning always produces a wheel with version 0.0.0 due to missing VERSION.EDITABLE. Suggest using setuptools-scm...

vshawrh

bug

? - Needs Triage

CuTe DSL

Add dual-GEMM examples for SM90 (Hopper) and SM120 (Blackwell)

3

Summary ------- Implements dual-GEMM examples for SM90 (Hopper) and SM120 (Blackwell) using CUTLASS 3.x. The dual-GEMM operation implemented is: ``` D0 = epilogue0(X @ B0, C0) D1 = epilogue1(X @...

Inodayy

inactive-30d

add cute.union

2

v0i0

[QST]How to configure blockM and blockN in GemmUniversal?

1

I need to fix blockM and blockN to ensure batch invariance. Can the CUTLASS gemm interface control this? using GemmKernel = cutlass::gemm::kernel::GemmUniversal< cute::Shape, CollectiveMainloop, CollectiveEpilogue>; thanks in advance!

4grass

question

? - Needs Triage

inactive-30d

[BUG] cutlass.cute.nvgpu.common.OpError: OpError: expects arch to be one of ['sm_100a', 'sm_100f'], but got sm_121a

2

### Which component has the problem? CuTe DSL ### Bug Report **Describe the bug** with nvidia-cutlass and nvidia-cutlass-dsl 4.2.0.0 ``` python cutlass/examples/python/CuTeDSL/blackwell/tutorial_gemm/fp16_gemm_1.py nvidia_cutlass_dsl/python_packages/cutlass/cute/nvgpu/tcgen05/mma.py", line 153, in __post_init__ raise OpError( cutlass.cute.nvgpu.common.OpError:...

whatdhack

bug

? - Needs Triage

CuTe DSL

[BUG] for loop and while loop have different behavior

2

### Which component has the problem? CuTe DSL ### Bug Report **Steps/Code to reproduce bug** ``` import torch import cutlass import cutlass.cute as cute from cutlass.cute.runtime import from_dlpack @cute.jit def...

HarryWu99

bug

? - Needs Triage

inactive-30d

CuTe DSL

cutlass
cutlass copied to clipboard

Metadata

[doc] Update flags for CUDA 13

Latex Printing Support

[BUG] CuTe DSL doesn't support with_shape?

[QST]Legal threadblock, warp, mma shape

[BUG] Dynamic versioning fails; wheel built as 0.0.0 for CuTe DSL

Add dual-GEMM examples for SM90 (Hopper) and SM120 (Blackwell)

add cute.union

[QST]How to configure blockM and blockN in GemmUniversal?

[BUG] cutlass.cute.nvgpu.common.OpError: OpError: expects arch to be one of ['sm_100a', 'sm_100f'], but got sm_121a

[BUG] for loop and while loop have different behavior

← Metadata

Owner

Metadata

cutlass cutlass copied to clipboard

Metadata

← Metadata

Owner

Metadata

cutlass
cutlass copied to clipboard