Lenan22

Results 3 issues of Lenan22

打印一下各个optim_pass的信息,感觉比较便于debug

constexpr int a_sh_rd_delta_o = 2 * ((threads / 32) / (thread_n_blocks / 4)); 1. Does the 32 here refer to a warp? 2. What does 4 here mean? 3. What...

` int slice_col_par = (iters * blockIdx.x) / k_tiles; int slice_col = slice_col_par; // int slice_iters; // number of threadblock tiles in the current slice int slice_count = 0; //...