cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

[FEA] Can I use copy to store register value into shared memory?

Open ziyuhuang123 opened this issue 2 years ago • 10 comments

Now I have tCrC, and I want to store them into shared memory. Can copy function do that? Thanks!

ziyuhuang123 avatar Dec 05 '23 02:12 ziyuhuang123

I mean, using CuTe.

ziyuhuang123 avatar Dec 05 '23 02:12 ziyuhuang123

If you've created your tCrC using some partitioner, then that same partitioner should be applied to an smem tensor. For example, we usually see this pattern:

Tensor gC = ...                                          // (BLK_M,BLK_N)
Tensor sC = make_tensor(make_smem_ptr(ptr), shape(gC));  // (BLK_M,BLK_N)

auto thr_mma = tiled_mma.get_slice(thread_idx);

Tensor tCgC = thr_mma.partition_C(gC);                   // (MMA,MMA_M,MMA_N)
Tensor tCsC = thr_mma.partition_C(sC);                   // (MMA,MMA_M,MMA_N)
Tensor tCrC = thr_mma.make_fragment_C(tCgC);             // (MMA,MMA_M,MMA_N)

...

copy(tCrC, tCgC);
// or
copy(tCrC, tCsC);
copy(tCsC, tCgC);
// or
copy(tCrC, tCsC);
if (threadIdx.x == 0) {
  copy(sC, gC);   // Copy the whole tile
}
// or re-partition sC and gC, etc

Which should, of course, also work with any shared memory layout for sC so long as its shape is still (compatible with) BLK_M x BLK_N.

ccecka avatar Dec 05 '23 02:12 ccecka

Thank you very much for your reply!!!!

I noticed you are using "auto thr_mma = tiled_mma.get_slice(thread_idx);" So what is its difference with: "auto tAgA = local_partition(gA, tA, threadIdx.x); // (THR_M,THR_K,k)" ??

ziyuhuang123 avatar Dec 05 '23 09:12 ziyuhuang123

The first is constructing an MMA partitioner from a TiledMMA (which is usually used to create tCxY partitioned tensors) and the second is partitioning with respect to the thread layout tA.

ccecka avatar Dec 05 '23 09:12 ccecka

Emmmm, so the output is the same, right? I mean, outputs are just "tensor" type, no matter how I get a "tensor", I can do "copy(tCrC, tCsC);" and cute will find a way to do the copy?

Thank you!!!

ziyuhuang123 avatar Dec 05 '23 11:12 ziyuhuang123

No, they are not related as they apply distinct partitioning patterns.

I suggest you review the existing documentation and wait for our updated documentation+examples coming soon.

ccecka avatar Dec 05 '23 11:12 ccecka

Wow! That's cool! Actually I am writing relavant code pushed by my supervisor, haha.... Can not wait too long~He is a nice guy, I mean, anyway I am also interested in cute by myself also.

Eagerly waiting for your update!

ziyuhuang123 avatar Dec 05 '23 11:12 ziyuhuang123

Currently I only see one example code.... Do you possibly know more codes written by cute?

ziyuhuang123 avatar Dec 05 '23 11:12 ziyuhuang123

Oh, I noticed most gemm codes are written using previous cutlass.... Maybe I should use older version...? Because that doc is more....

ziyuhuang123 avatar Dec 05 '23 11:12 ziyuhuang123

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Jan 05 '24 16:01 github-actions[bot]

Closing due to inactivitly

mnicely avatar Feb 22 '24 15:02 mnicely