ziyuhuang123
ziyuhuang123
``` auto tC = make_layout(make_shape(Int{}, Int{})); auto tCsA = local_partition(sA, tC, threadIdx.x, Step{}); ``` But I get (_8,_8) as tCsA's shape, why??? I am learning code: https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/sgemm_nt_1.cu
**What is your question?** ``` Array access Users access a Tensor's elements in one of three ways: operator(), taking as many integral arguments as the number of modes, corresponding to...
**What is your question?** Hi! I see swizzle.hpp file, but I am not that clever to use it. Like for sgemm_nt.cu code you provided, could you show me how to...
I know make_tiled_mma will create a mma_tile, and then along M, N, K we will get MMA_M, MMA_N, MMA_K dimensions. So inside cute::gemm, we will loop across MMA_M, MMA_N, MMA_K...
https://github.com/reed-lau/cute-gemm/blob/25952de314daf740ec637a3bb7bb145605fd5edb/gemm-simple.cu#L84-L86 I learnt [0t_mma_atom](https://github.com/NVIDIA/cutlass/blob/c4e3e122e266644c61b4af33d0cc09f4c391a64b/media/docs/cute/0t_mma_atom.md) I know a mma will compute a certain part, like here we use "SM80_16x8x8_F16F16F16F16_TN" so it is 16-8-16(MNK), ``` make_tiled_mma(mma_atom{}, make_layout(Shape{}), make_layout(Shape{})) ``` Above will let...
I am learning this example: https://github.com/NVIDIA/cutlass/blob/c4e3e122e266644c61b4af33d0cc09f4c391a64b/examples/cute/tutorial/sgemm_1.cu#L209-L211 **What is your question?** ``` cp_async_fence(); // Label the end of (potential) cp.async instructions cp_async_wait(); // Sync on all (potential) cp.async instructions __syncthreads(); //...
**What is your question?** ``` // ((_3,2),(2,_5,_2)):((4,1),(_2,13,100)) Tensor A = make_tensor(ptr, make_shape (make_shape (Int{},2), make_shape ( 2,Int{},Int{})), make_stride(make_stride( 4,1), make_stride(Int{}, 13, 100))); // ((2,_5,_2)):((_2,13,100)) Tensor B = A(2,_); // ((_3,_2)):((4,1))...
**What is your question?** // Get the appropriate blocks for this thread block auto cta_coord = make_coord(blockIdx.x, blockIdx.y, _); // (m,n,k) Tensor gA = local_tile(mA, cta_tiler, cta_coord, Step{}); // (BLK_M,BLK_K,k)...
Hi! I am learning SGEMM and find in dispatch_policies.h has a "Custom", "CustomBack". Not sure what does this mean? Thank you!!!
Hi! I find your repo very interesting and I gave it a star without hesitation! I am also learning L2 cache recently, so I wonder where it uses "immediate eviction"...