cutlass issues

Tiny fix tutorial text

6

from my naive understanding the second arrow (output "?") is correct

[BUG] CuteDSL example hits IMA for large tensors due to strides in Int32

6

Tensors in Cute DSL uses strides in int32 by default. This causes IMA for large tensors. Is there a way to force strides to be int64? **Steps/Code to reproduce bug**...

tridao

bug

? - Needs Triage

[FEA] [CuTeDSL] `print_latex` in `CuTeDSL`

12

**Is your feature request related to a problem? Please describe.** It would be nice to have utility function in `CuTeDSL` like `print_latex` in `C++` API **Describe the solution you'd like**...

simveit

feature request

inactive-30d

CuTe DSL

[QST] [CuTeDSL] Nsight Compute Profiler Link to Source Code

8

**What is your question?** When profiling CUDA/CUTLASS, the profiler can provide line-by-line profiling for user code, in addition to PTX and SASS. Triton can also do this, likely because its...

HanGuo97

feature request

CuTe DSL

Mixed Precision Grouped Gemm with zero points and GPT-Q semantics closes #2261

8

Hello! This MR provides two things: 1) Zero points for default mode 2) GPT-Q [semantics](https://pytorch.org/blog/accelerating-triton/) Closes #2261

ankutalev

At least 4.1 and `3.9.0.0` is missing from PYPI. As cuda-python 12.6.2 is required for CUDA 12.6 and that has API deprecations (`import cuda.bindings.cuda` instead of `import cuda.cuda`) nvidia-cutlass 4.1...

Flamefire

inactive-30d

Remove x premission of CMakeLists.txt

Rtoax

[QST] The strange bank conflict in the CuTeDSL python gemm demo.

1

## code https://github.com/NVIDIA/cutlass/blob/main/examples/python/CuTeDSL/ampere/tensorop_gemm.py ## env ``` pip list | grep -i cutlass nvidia-cutlass-dsl 4.3.0 nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2025 NVIDIA Corporation Built on Wed_Aug_20_01:57:39_PM_PDT_2025...

LRlr239

question

? - Needs Triage

cutlass
cutlass copied to clipboard

Metadata

Tiny fix tutorial text

[BUG] CuteDSL example hits IMA for large tensors due to strides in Int32

[FEA] [CuTeDSL] `print_latex` in `CuTeDSL`

[QST] [CuTeDSL] Nsight Compute Profiler Link to Source Code

Mixed Precision Grouped Gemm with zero points and GPT-Q semantics closes #2261

Missing PYPI releases

Remove x premission of CMakeLists.txt

[QST] The strange bank conflict in the CuTeDSL python gemm demo.

← Metadata

Owner

Metadata

cutlass cutlass copied to clipboard

Metadata

← Metadata

Owner

Metadata

cutlass
cutlass copied to clipboard