Mr Knight

Results 3 issues of Mr Knight

I have another question about MMult_cuda_12.cu Honestly, I don't know how to overlap the share2register and computing process. Is it the asm(PTX) that make them run parallelly? The instructions are...

question

https://github.com/tpoisonooo/how-to-optimize-gemm/blob/master/cuda/MMult_cuda_12.cu: 20,21 I'm a beginner of CUDA&&PTX, I want to know what does these two PTX use for? "{.reg .pred p;\n" "mov.b32 %0, 0;\n" is it useless code?

您好,git里提供的yolov6s.pt,有的weights的某个通道的权值全都是0, (例如backbone.stem.rbr_dense.conv.weight): mm.backbone.stem.rbr_dense.conv.weight Parameter containing: tensor([[[[-0.0000e+00, 0.0000e+00, -0.0000e+00], [ 0.0000e+00, 0.0000e+00, -0.0000e+00], [ 0.0000e+00, 0.0000e+00, 0.0000e+00]], [[-0.0000e+00, -0.0000e+00, -0.0000e+00], [ 0.0000e+00, 0.0000e+00, 0.0000e+00], [ 0.0000e+00, -0.0000e+00, -0.0000e+00]], [[ 0.0000e+00, 0.0000e+00,...