ziyuhuang123

Results 17 comments of ziyuhuang123

Like the example here, what does the output mean? ``` // Tile a tensor according to the flat shape of a layout that provides the coordinate of the target index....

So I print(A.shape), I still get a value, but actually it is different from normal tensor definition. So why here it is still a "tensor" object??? Confusing!

It seems that step(_1, X) mean, for first dimension, divide as normal, for the second dimension, do not divide.

Also, for detailed local_partition, how it divide the data? Do we have bank conflict(yes, we will have), and how can we avoid it?

Emmm, thank you, I have read all three blogs you mentioned, but you are discussing cuda core ..... I am learning tensor core so I am reading cutlass. ?

> Off topic: Just came across this issue (as a github-mancer). Based on your recent questions I assume you want to write gemm from ground up. And not to be...

Thank you very much for your reply!!!! I noticed you are using "auto thr_mma = tiled_mma.get_slice(thread_idx);" So what is its difference with: "auto tAgA = local_partition(gA, tA, threadIdx.x); // (THR_M,THR_K,k)"...