cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

CUDA Templates for Linear Algebra Subroutines

Results 608 cutlass issues
Sort by recently updated
recently updated
newest added

**What is your question?** Hello, I am testing the AOT feature using CuTeDSL with TVM-FFI. Does AOT compilation support cross-compilation for a different compute capability? For example, for the `examples/python/CuTeDSL/cute/tvm_ffi/aot_export.py`...

question
? - Needs Triage

### Which component has the problem? CuTe DSL ### Bug Report Bug Report Summary CUTLASS 4.2+ added SM120 and SM121 kernel support for Blackwell GeForce (RTX 50-series) and DGX Spark...

bug
? - Needs Triage
CuTe DSL

Here I write a simple cuteDSL program in order to perform cast from fp32 tensor to bf16 tensor: ``` import argparse import math import torch import triton from typing import...

question
? - Needs Triage

**What is your question?** cute.copy will always fully unroll its inner load/store. But in some case, the unrolling in cute.copy will case serious register spill. So I wonder how to...

question
? - Needs Triage

When I compile cutdsl from source and run `import cutlass`, I get the error "No module named 'cutlass._mlir'". I'd like to know what operations need to be performed on the...

question
? - Needs Triage

### Which component requires the feature? CuTe DSL ### Feature Request Hi, pip install nvidia-cutlass-dsl fails on Windows as seeing latest 4.1.0: https://pypi.org/project/nvidia-cutlass-dsl/#files only supports manylinux.. so requesting Windows support.....

feature request
? - Needs Triage
CuTe DSL

### Which component has the problem? CUTLASS C++ ### Bug Report **Describe the bug** [ 10%] Building CUDA object tools/library/CMakeFiles/cutlass_library_gemm_sm100_bf16_gemm_grouped_e2m1_objs.dir/generated/gemm/100/bf16_gemm_grouped_e2m1/cutlass3x_sm100_bstensorop_gemm_grouped_ue8m0xe2m1_ue8m0xe2m1_f32_bf16_bf16_256x64x256_0x0x1_0_tnt_align32_o_vs32_2sm_epi_tma.cu.o cd /workspace/cutlass/build/tools/library && /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler --options-file CMakeFiles/cutlass_library_gemm_sm100_bf16_gemm_grouped_e2m1_objs.dir/includes_CUDA.rsp -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG...

bug
? - Needs Triage
inactive-30d
CUTLASS C++

I’m checking out Example 23 and found a thing when using kGemmSplitKParallel mode; I’d like to get this cleared up: In this mode, the example explicitly allocates a block of...

question
? - Needs Triage
inactive-30d