cutlass issues

[BUG] Cutlass 3x gemms can't be compiled with clang

6

Clang built from source: https://clang.llvm.org/get_started.html ``` ../llvm-project/build/bin/clang -v clang version 18.0.0git (https://github.com/llvm/llvm-project.git a855b2c894444419c3689aff6fd0381fdeb02491) ``` main.cpp ``` #include #include "cutlass/epilogue/collective/collective_builder.hpp" int main() { cutlass::half_t x = 2.25_hf; std::cout

ezhulenev

bug

inactive-30d

clang

[QST] How does local_tile work? Could you provide an detailed explanation?

4

auto gA = local_tile(mA, blk_shape, blk_coord, Step{}); // (BLK_M,BLK_K,k) I am learning this line in example code: https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/sgemm_nt_1.cu How we get this? By the way, I print it out, size...

ziyuhuang123

question

inactive-30d

inactive-90d

CuTe

[QST] How local_partition works?

5

``` auto tC = make_layout(make_shape(Int{}, Int{})); auto tCsA = local_partition(sA, tC, threadIdx.x, Step{}); ``` But I get (_8,_8) as tCsA's shape, why??? I am learning code: https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/sgemm_nt_1.cu

ziyuhuang123

question

? - Needs Triage

inactive-30d

inactive-90d

CuTe

[QST] How to use cutlass in tensorrt_llm plugin?

15

**What is your question?** Hello, thanks for your project. cutlass version: 2.10 device RTX 3090 I want to implement a W4A4 conv quantization in tensorrt_llm by cutlass. Follow the example...

yuanjiechen

question

inactive-30d

inactive-90d

[QST]What is operator? How we use operator? (To access tensor elements)

5

**What is your question?** ``` Array access Users access a Tensor's elements in one of three ways: operator(), taking as many integral arguments as the number of modes, corresponding to...

ziyuhuang123

question

? - Needs Triage

inactive-30d

[QST] How to use swizzle to avoid bank conflict?

5

**What is your question?** Hi! I see swizzle.hpp file, but I am not that clever to use it. Like for sgemm_nt.cu code you provided, could you show me how to...

ziyuhuang123

question

? - Needs Triage

inactive-30d

inactive-90d

[BUG] Implicitly generate unexpected LDGSTS instructions for A100

3

**Describe the bug** Using DefaultCopy on A100 implicitly generates the unexpected LDGSTS. Users are not aware of the need to commit and wait. **Steps/Code to reproduce bug** ``` using GmemTiledCopy...

cctry

bug

inactive-30d

inactive-90d

[QST] cpp11.cu compiles with c++14

5

I think [cpp11.cu](https://github.com/NVIDIA/cutlass/blob/6e60b9b17c5e6734488dbb7401b5c55ccb37feba/test/unit/core/cpp11.cu#L76) should be comparing against (from https://gcc.gnu.org/onlinedocs/cpp/Standard-Predefined-Macros.html) `201103L`. Although I vaguely remember that with a newer compiler, it can be difficult to test old standard compatibility. So maybe...

chsigg

inactive-30d

[QST] How to profile the CUTLASS with all of the optimization techniques?

6

**What is your question?** Hi, Thanks for the great work! Recently, I am exploring the performance improvement from all of the optimization in CUTLASS. I want to profile all of...

ybai62868

question

inactive-30d

inactive-90d

[QST] How to avoid too many resources requested

10

**What is your question?** I try to use the `cutlass::conv::device::Convolution` with the fixed ThreadblockShape, WarpShape and InstructionShape. There is internal error which is too many resources requested actually. It may...

YSF-A

question

inactive-30d

inactive-90d

cutlass
cutlass copied to clipboard

Metadata

[BUG] Cutlass 3x gemms can't be compiled with clang

[QST] How does local_tile work? Could you provide an detailed explanation?

[QST] How local_partition works?

[QST] How to use cutlass in tensorrt_llm plugin?

[QST]What is operator? How we use operator? (To access tensor elements)

[QST] How to use swizzle to avoid bank conflict?

[BUG] Implicitly generate unexpected LDGSTS instructions for A100

[QST] cpp11.cu compiles with c++14

[QST] How to profile the CUTLASS with all of the optimization techniques?

[QST] How to avoid too many resources requested

← Metadata

Owner

Metadata

cutlass cutlass copied to clipboard

Metadata

← Metadata

Owner

Metadata

cutlass
cutlass copied to clipboard