lijingticy22 comments

Results 5 comments of


                                            lijingticy22

[BUG] Illegal instruction on H100 TMA

2 things should be fixed: 1. tile_to_shape order should be (1,0), meaning 2nd mode K is contiguous dimensio 2. tma_partition need use cute.group_modes(sA, 0, 2) for both sA and mA,...

[BUG] Illegal instruction on H100 TMA

3rd thing need be changed to make you pass mbarrier_wait is, change "if tidx == 0:" to "if tidx < 32:", internally in cute.copy implementation for tma_copy, we would have...

[BUG] Illegal instruction on H100 TMA

>Question: are there any docs on these things? Sorry, we do not yet have doc for `tma_partition`, we will work on it in next releases. For your question, the smem...

[BUG] Illegal instruction on H100 TMA

>It seems like sA_layout = cute.tile_to_shape(sw128_k_atom, (M, K), (1, 0)) and >sA_wrong = cute.tile_to_shape(sw128_k_atom, (M, K), (0, 1)) This is because the contiguous dimension K in your case is exactly...

[QST] [CuTeDSL] Unexpected behavior with async copy on ampere

Looking at local_partition function definition in [here](https://github.com/NVIDIA/cutlass/blob/main/include/cute/tensor_impl.hpp#L1073), you will find index used to produce a coord into tile "tile.get_flat_coord(index)", in your case tile is (1,1):(0,0) layout, which means you can...