CUDA-GEMM-Optimization
CUDA-GEMM-Optimization copied to clipboard
Reading this is not safe.
https://github.com/leimao/CUDA-GEMM-Optimization/blob/72e2d189421a9ed359aa2500886164199ebf36bb/src/06_2d_block_tiling_2d_warp_tiling_2d_thread_tiling_matrix_transpose_vectorized_memory_access.cu#L358
This reading and the computation should be inside the if to make sure we are not reading out-of-bound values from the matrix even if the matrix is padded to 32 bytes.