How_to_optimize_in_GPU icon indicating copy to clipboard operation
How_to_optimize_in_GPU copied to clipboard

Performance doubles by only changing one line of code

Open xiefan46 opened this issue 8 months ago • 1 comments

Hi xiandong,

Thanks for providing this amazing tutorial! Recently I am working on reduce0 and I found that I can double the performance of reduce_v0_baseline.cu kernel by simply changing a blockDim.x into THREAD_PER_BLOCK in the for loop

before

Image

profile result:

Image

after

Image

profile result:

Image

I guess this is because of loop unrolling? It's quite interesting that a simple change makes a big difference

xiefan46 avatar Feb 20 '25 13:02 xiefan46