blueoyster6
blueoyster6
Thanks! That makes sense. Additionally, what's the concept of a cohort raster? And what is cohort CTA rasterization? See these [lines](https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/threadblock/threadblock_swizzle_streamk.h#L549-L582) in streamK.
Got it. Also, in 1. [lines](https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/threadblock/threadblock_swizzle_streamk.h#L276-L288), how did they choose the factors for iter, base, and peer costs?? 2. In [line](https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/threadblock/threadblock_swizzle_streamk.h#L522), what does epilogue accumulator fragments denote? How is it...