cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

CUDA Templates for Linear Algebra Subroutines

Results 608 cutlass issues
Sort by recently updated
recently updated
newest added

**What is your question?** Hi there, thank you for the work on CUTLASS3.0/CuTe. The "layout algebra" in 3.0 is much more elegant and easier to use than iterators. I guess...

question
inactive-30d
CuTe

Could any body explain "Layout compatibility"? Show examples will be nice.

documentation
? - Needs Triage
inactive-30d
CuTe

how to implement general conv fwd/dgrad/wgrad by cute? could you give examples based on hopper cute?

feature request
help wanted
inactive-30d
inactive-90d
CuTe

I have gone through the documentation and the available APIs, but I couldn't find explicit information on whether CuTe supports sparse tensor operations or not. Does CuTe currently support sparse...

feature request
question
inactive-30d
CuTe

Hello, every cutlass experts, I'm confused by the implementation of Semaphore. its "fetch" like this: ```C++ if (wait_thread) { #if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700 asm volatile ("ld.global.acquire.gpu.b32 %0, [%1];\n"...

question
? - Needs Triage
inactive-30d
inactive-90d

**Is your feature request related to a problem? Please describe.** As a user of CUTLASS, I would like to build a shared object library, `libA.so`, that internally uses CUTLASS function...

feature request
inactive-30d
inactive-90d

https://github.com/NVIDIA/cutlass/blob/5c447dd84f8ae0e1d48ff9a2eae26ce8c4958101/include/cutlass/gemm/warp/default_mma_tensor_op.h#L121 https://github.com/NVIDIA/cutlass/blob/5c447dd84f8ae0e1d48ff9a2eae26ce8c4958101/include/cutlass/gemm/warp/default_mma_tensor_op_sm80.h#L43 `default_mma_tensor_op.h` includes `default_mma_tensor_op_sm80.h`, while the later also includes the former. Is this a problem?

inactive-30d

### Discussed in https://github.com/NVIDIA/cutlass/discussions/1504 Originally posted by **wzhcz8902** April 28, 2024 https://github.com/NVIDIA/cutlass/blob/5c447dd84f8ae0e1d48ff9a2eae26ce8c4958101/include/cutlass/gemm/warp/mma_tensor_op.h#L140-L168 As a newbie to cutlass, I think this struct is targeted for tensor cores, not cuda cores as...

inactive-30d

support Layout::kFactor with 8 loading data from shared memory: |0 | 16 | 32 | 48 | 64 | 80 | 96 | 112| |-- | -- | -- |...

inactive-30d

**What is your question?** I am trying to understand how the right_inverse works in the cute. For example, https://github.com/NVIDIA/cutlass/blob/main/test/python/pycute/test_right_inverse.py#L88 The given input is `Layout((2,4,6),(4,1,8))`, I just couldn't figure out why...

question