cutlass [QST]How MMA_M, MMA_N, MMA

[QST]How MMA_M, MMA_N, MMA_K computed in cute?

Open ziyuhuang123 opened this issue 10 months ago • 2 comments

I know make_tiled_mma will create a mma_tile, and then along M, N, K we will get MMA_M, MMA_N, MMA_K dimensions. So inside cute::gemm, we will loop across MMA_M, MMA_N, MMA_K one by one?

Mar 31 '24 09:03 ziyuhuang123

Yes. First you have a MMA atom and you will use them to make a MMA tile, then use the tile to partition the compute tile. For example, if you start with the 16-8-16 MMA atom, and configure the layout of atoms to be (3,4,2), that is equivalently to say you create a bigger MMA atom of size 48x32x32, and it uses 32x3x4x2 threads to run in parallel. Finally, if the compute tile has size 96-96-64, then you get extra dimensions (MMM_M, MMA_N, MMA_K)=(96/48, 96/32, 64/32) and they are running sequentially.

Apr 01 '24 04:04 YichengDWu

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

May 19 '24 23:05 github-actions[bot]

cutlass cutlass copied to clipboard

[QST]How MMA_M, MMA_N, MMA_K computed in cute?

cutlass
cutlass copied to clipboard