cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

CUDA Templates for Linear Algebra Subroutines

Results 608 cutlass issues
Sort by recently updated
recently updated
newest added

Clang built from source: https://clang.llvm.org/get_started.html ``` ../llvm-project/build/bin/clang -v clang version 18.0.0git (https://github.com/llvm/llvm-project.git a855b2c894444419c3689aff6fd0381fdeb02491) ``` main.cpp ``` #include #include "cutlass/epilogue/collective/collective_builder.hpp" int main() { cutlass::half_t x = 2.25_hf; std::cout

bug
inactive-30d
clang

auto gA = local_tile(mA, blk_shape, blk_coord, Step{}); // (BLK_M,BLK_K,k) I am learning this line in example code: https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/sgemm_nt_1.cu How we get this? By the way, I print it out, size...

question
inactive-30d
inactive-90d
CuTe

``` auto tC = make_layout(make_shape(Int{}, Int{})); auto tCsA = local_partition(sA, tC, threadIdx.x, Step{}); ``` But I get (_8,_8) as tCsA's shape, why??? I am learning code: https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/sgemm_nt_1.cu

question
? - Needs Triage
inactive-30d
inactive-90d
CuTe

**What is your question?** Hello, thanks for your project. cutlass version: 2.10 device RTX 3090 I want to implement a W4A4 conv quantization in tensorrt_llm by cutlass. Follow the example...

question
inactive-30d
inactive-90d

**What is your question?** ``` Array access Users access a Tensor's elements in one of three ways: operator(), taking as many integral arguments as the number of modes, corresponding to...

question
? - Needs Triage
inactive-30d

**What is your question?** Hi! I see swizzle.hpp file, but I am not that clever to use it. Like for sgemm_nt.cu code you provided, could you show me how to...

question
? - Needs Triage
inactive-30d
inactive-90d

**Describe the bug** Using DefaultCopy on A100 implicitly generates the unexpected LDGSTS. Users are not aware of the need to commit and wait. **Steps/Code to reproduce bug** ``` using GmemTiledCopy...

bug
inactive-30d
inactive-90d

I think [cpp11.cu](https://github.com/NVIDIA/cutlass/blob/6e60b9b17c5e6734488dbb7401b5c55ccb37feba/test/unit/core/cpp11.cu#L76) should be comparing against (from https://gcc.gnu.org/onlinedocs/cpp/Standard-Predefined-Macros.html) `201103L`. Although I vaguely remember that with a newer compiler, it can be difficult to test old standard compatibility. So maybe...

inactive-30d

**What is your question?** Hi, Thanks for the great work! Recently, I am exploring the performance improvement from all of the optimization in CUTLASS. I want to profile all of...

question
inactive-30d
inactive-90d

**What is your question?** I try to use the `cutlass::conv::device::Convolution` with the fixed ThreadblockShape, WarpShape and InstructionShape. There is internal error which is too many resources requested actually. It may...

question
inactive-30d
inactive-90d