How_to_optimize_in_GPU
How_to_optimize_in_GPU copied to clipboard
Performance doubles by only changing one line of code
Hi xiandong,
Thanks for providing this amazing tutorial! Recently I am working on reduce0 and I found that I can double the performance of reduce_v0_baseline.cu kernel by simply changing a blockDim.x into THREAD_PER_BLOCK in the for loop
before
profile result:
after
profile result:
I guess this is because of loop unrolling? It's quite interesting that a simple change makes a big difference