cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

CUDA Templates for Linear Algebra Subroutines

Results 608 cutlass issues
Sort by recently updated
recently updated
newest added

from my naive understanding the second arrow (output "?") is correct

Tensors in Cute DSL uses strides in int32 by default. This causes IMA for large tensors. Is there a way to force strides to be int64? **Steps/Code to reproduce bug**...

bug
? - Needs Triage

**Is your feature request related to a problem? Please describe.** It would be nice to have utility function in `CuTeDSL` like `print_latex` in `C++` API **Describe the solution you'd like**...

feature request
inactive-30d
CuTe DSL

**What is your question?** When profiling CUDA/CUTLASS, the profiler can provide line-by-line profiling for user code, in addition to PTX and SASS. Triton can also do this, likely because its...

feature request
CuTe DSL

Hello! This MR provides two things: 1) Zero points for default mode 2) GPT-Q [semantics](https://pytorch.org/blog/accelerating-triton/) Closes #2261

At least 4.1 and `3.9.0.0` is missing from PYPI. As cuda-python 12.6.2 is required for CUDA 12.6 and that has API deprecations (`import cuda.bindings.cuda` instead of `import cuda.cuda`) nvidia-cutlass 4.1...

inactive-30d

## code https://github.com/NVIDIA/cutlass/blob/main/examples/python/CuTeDSL/ampere/tensorop_gemm.py ## env ``` pip list | grep -i cutlass nvidia-cutlass-dsl 4.3.0 nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2025 NVIDIA Corporation Built on Wed_Aug_20_01:57:39_PM_PDT_2025...

question
? - Needs Triage