cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

[QST]In tma_wasp producer's 128 threads, warp1 and warp3 are idle?

Open ziyuhuang123 opened this issue 6 months ago • 1 comments

sm90_gemm_tma_warpspecialized_cooperative

    enum class ProducerWarpRole {
      Mainloop = 0,
      Warp1 = 1,
      Epilogue = 2,
      Warp3 = 3
    };

I find usage of Mainloop and Epilogue, but no usage of warp1 and 3?

By the way, I noticed if we use 320 threads(do not use warp1 3) the occupancy will be 10, and if we use 384 threads(original cutlass method) the occupancy will be 12. Maybe because of this?

ziyuhuang123 avatar Aug 02 '24 09:08 ziyuhuang123