how-to-optimize-gemm icon indicating copy to clipboard operation
how-to-optimize-gemm copied to clipboard

about ldg32_nc_0

Open YijiaZhao opened this issue 2 years ago • 3 comments

https://github.com/tpoisonooo/how-to-optimize-gemm/blob/master/cuda/MMult_cuda_12.cu: 20,21 I'm a beginner of CUDA&&PTX, I want to know what does these two PTX use for? "{.reg .pred p;\n" "mov.b32 %0, 0;\n" is it useless code?

YijiaZhao avatar May 13 '22 13:05 YijiaZhao

For .reg .pred p; yes it is useless. The code is originally used for predicate guard, to handle conditional execution.

mov.b32 %0, 0 is used for clean reg. If you do not like it, just remove it.

tpoisonooo avatar May 17 '22 02:05 tpoisonooo

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html

tpoisonooo avatar May 17 '22 02:05 tpoisonooo

thank you

YijiaZhao avatar May 18 '22 01:05 YijiaZhao