how-to-optimize-gemm
how-to-optimize-gemm copied to clipboard
about ldg32_nc_0
https://github.com/tpoisonooo/how-to-optimize-gemm/blob/master/cuda/MMult_cuda_12.cu: 20,21 I'm a beginner of CUDA&&PTX, I want to know what does these two PTX use for? "{.reg .pred p;\n" "mov.b32 %0, 0;\n" is it useless code?
For .reg .pred p;
yes it is useless. The code is originally used for predicate guard, to handle conditional execution.
mov.b32 %0, 0
is used for clean reg. If you do not like it, just remove it.
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html
thank you