cutlass
cutlass copied to clipboard
[QST]The Persistent Tile Scheduler in CUTLASS?
Could you please explain how the persistent tile scheduler in CUTLASS works? Does it mean that a single CTA continuously processes multiple blocks, or is the work of different kernels assigned to a single CTA?