ziyuhuang123
ziyuhuang123
**What is your question?** Here prologue is not only for initialize, it also has cute::gemm. Why? What is the benefit and meaning here? Overlap? How?
**What is your question?** data:image/s3,"s3://crabby-images/bcdb5/bcdb5f8b0e6f6fd361355df0171e212619ab9a35" alt="image" Like here, I see many usage of PipelineState but find no definition. I do find some in other files like: ``` using PipelineState = cutlass::PipelineState;...
**Describe the bug** I try to use cuda-gdb, and I add debug -g -G tag to example48, then it failed. Even before cuda-gdb. cuda-gdb team reports a same bug here:...
**What is your question?** I tried example48, and I find that in producer, the epilogue is not used at all!? I am puzzled that what is the function of producer-epilogue....
**What is your question?** I print a tensor and get: ``` smem_ptr[16b](0x7fe900000c00) o Sw o _0 o (((_64,_256),_2)):(((_1,_64),_16384)) ``` What is these 0 and o in print? Where is the...
**What is your question?** Like if I have a variable tensor_3d, how can I know its type? type(tensor_3d)???
**What is your question?** ``` auto tensor_2d = make_tensor(tensor_3d.data(), make_shape(64, 256)); printf("tensor_2d\n"); print(tensor_2d); printf("\n"); print(tRS_sD); printf("\n"); print(bSG_sD); printf("\n"); print(gD_epi); printf("\n"); ``` ``` tensor_3d smem_ptr[16b](0x7f8f00000c00) o Sw o _0 o (((_64,_256),_2)):(((_1,_64),_16384))...
I know we have persistent block, but seems the block number is slightly higher than SM number. Where is it defined?
Could you please explain how the persistent tile scheduler in CUTLASS works? Does it mean that a single CTA continuously processes multiple blocks, or is the work of different kernels...
setmaxnreg is a new feature since Hopper. I noticed this in cutlass: https://github.com/NVIDIA/cutlass/blob/eee0cab26c8eedea447eb3b58b3498eeba2294da/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp#L446 From above, the consumer register is 232, the producer register is 40. Different warp can use different...