How_to_optimize_in_GPU icon indicating copy to clipboard operation
How_to_optimize_in_GPU copied to clipboard

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...

Results 7 How_to_optimize_in_GPU issues
Sort by recently updated
recently updated
newest added

- 报错现象: `./How_to_optimize_in_GPU/reduce# make ` ` nvcc -o bin/reduce_v0 reduce_v0_baseline.cu ` ` /usr/bin/ld: cannot open output file bin/reduce_v0: No such file or directory ` ` collect2: error: ld returned 1...

Sgemv_v0里 “我们将每个block设置为256个线程” 是不是应该是128个线程

which modifications are necessary for arbitrary input size N are necessary for reduce_v7? I changed it to ```400*500``` and get wrong answer.

hello, May I ask a question; why N must be 32X1024X1024? I reduced it to 1024, I hit a deadlock.

当K=64,128等的时候,结果不同。 例如: size_t M = 8000; size_t K = 64; size_t N = 8000; ![image](https://github.com/Liu-xiandong/How_to_optimize_in_GPU/assets/53090559/0b94fedb-8063-4059-83ae-d5af326264aa)

Hi xiandong, Thanks for providing this amazing tutorial! Recently I am working on reduce0 and I found that I can double the performance of reduce_v0_baseline.cu kernel by simply changing a...