How_to_optimize_in_GPU
How_to_optimize_in_GPU copied to clipboard
Performance doubles by only changing one line of code
Hi xiandong,
Thanks for providing this amazing tutorial! Recently I am working on reduce0 and I found that I can double the performance of reduce_v0_baseline.cu kernel by simply changing a blockDim.x into THREAD_PER_BLOCK in the for loop
before
profile result:
after
profile result:
I guess this is because of loop unrolling? It's quite interesting that a simple change makes a big difference
Hi xiandong,
Thanks for providing this amazing tutorial! Recently I am working on reduce0 and I found that I can double the performance of reduce_v0_baseline.cu kernel by simply changing a blockDim.x into THREAD_PER_BLOCK in the for loop
before
profile result:
after
profile result:
I guess this is because of loop unrolling? It's quite interesting that a simple change makes a big difference
Yes,If it is blockDim.x, the compiler cannot determine this value, so the loop will not be unrolled.