How_to_optimize_in_GPU
How_to_optimize_in_GPU copied to clipboard
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
- 报错现象: `./How_to_optimize_in_GPU/reduce# make ` ` nvcc -o bin/reduce_v0 reduce_v0_baseline.cu ` ` /usr/bin/ld: cannot open output file bin/reduce_v0: No such file or directory ` ` collect2: error: ld returned 1...
Sgemv_v0里 “我们将每个block设置为256个线程” 是不是应该是128个线程
which modifications are necessary for arbitrary input size N are necessary for reduce_v7? I changed it to ```400*500``` and get wrong answer.
hello, May I ask a question; why N must be 32X1024X1024? I reduced it to 1024, I hit a deadlock.
当K=64,128等的时候,结果不同。 例如: size_t M = 8000; size_t K = 64; size_t N = 8000; 
Hi xiandong, Thanks for providing this amazing tutorial! Recently I am working on reduce0 and I found that I can double the performance of reduce_v0_baseline.cu kernel by simply changing a...