ziyuhuang123

Results 61 issues of ziyuhuang123

**What is your question?** Here prologue is not only for initialize, it also has cute::gemm. Why? What is the benefit and meaning here? Overlap? How?

question
? - Needs Triage
inactive-30d

**What is your question?** ![image](https://github.com/user-attachments/assets/98eab07b-1903-425e-9439-5178169c52e4) Like here, I see many usage of PipelineState but find no definition. I do find some in other files like: ``` using PipelineState = cutlass::PipelineState;...

question
? - Needs Triage

**Describe the bug** I try to use cuda-gdb, and I add debug -g -G tag to example48, then it failed. Even before cuda-gdb. cuda-gdb team reports a same bug here:...

bug
? - Needs Triage

**What is your question?** I tried example48, and I find that in producer, the epilogue is not used at all!? I am puzzled that what is the function of producer-epilogue....

question
? - Needs Triage

**What is your question?** I print a tensor and get: ``` smem_ptr[16b](0x7fe900000c00) o Sw o _0 o (((_64,_256),_2)):(((_1,_64),_16384)) ``` What is these 0 and o in print? Where is the...

question
? - Needs Triage

**What is your question?** Like if I have a variable tensor_3d, how can I know its type? type(tensor_3d)???

question
? - Needs Triage

**What is your question?** ``` auto tensor_2d = make_tensor(tensor_3d.data(), make_shape(64, 256)); printf("tensor_2d\n"); print(tensor_2d); printf("\n"); print(tRS_sD); printf("\n"); print(bSG_sD); printf("\n"); print(gD_epi); printf("\n"); ``` ``` tensor_3d smem_ptr[16b](0x7f8f00000c00) o Sw o _0 o (((_64,_256),_2)):(((_1,_64),_16384))...

question
? - Needs Triage

I know we have persistent block, but seems the block number is slightly higher than SM number. Where is it defined?

question
? - Needs Triage

Could you please explain how the persistent tile scheduler in CUTLASS works? Does it mean that a single CTA continuously processes multiple blocks, or is the work of different kernels...

question
? - Needs Triage

setmaxnreg is a new feature since Hopper. I noticed this in cutlass: https://github.com/NVIDIA/cutlass/blob/eee0cab26c8eedea447eb3b58b3498eeba2294da/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp#L446 From above, the consumer register is 232, the producer register is 40. Different warp can use different...

question
? - Needs Triage