Optimizing-SGEMM-on-NVIDIA-Turing-GPUs issues

Results 4 Optimizing-SGEMM-on-NVIDIA-Turing-GPUs issues

Sort by recently updated

Cannot build the project

Dear @yzhaiustc , Thanks for you amazing effort for this repo. However, we cannot build this project, because the ``helper_functions..h`` is missing. Can you provide such files? Thanks!

chaoming0625

``` cpp for (n_count = 0; n_count < N; n_count++) { cudaEventRecord(beg); test_kernel(kernel_num, m, n, k, alpha, dA, dB, beta, dC, err); cudaEventRecord(end); cudaEventSynchronize(beg); cudaEventSynchronize(end); cudaEventElapsedTime(&ms, beg, end); elapsed_time +=...

alg-leon

kernel3

您好，我就用中文提问了呀。在kernel3中，你把blocksize从(32,32)改为(1024),这种做法的优点你说有3点好处:1.storing threadIdx.x before re-using it massively 2. in order to reduce living registers 3. benefit the compiler optimization 这几点我都不太懂是啥意思。在书中和网上都找不到对应的解释，能麻烦您能说的详细一些吗? 如果还能给出参考资料那也是最好不过的！

liuqi123123

为什么矩阵的索引式列主序的

如题

VeritasFutureKF

Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing-SGEMM-on-NVIDIA-Turing-GPUs copied to clipboard

Metadata

Cannot build the project

event时间统计有问题

kernel3

为什么矩阵的索引式列主序的

← Metadata

Owner

Metadata

Optimizing-SGEMM-on-NVIDIA-Turing-GPUs Optimizing-SGEMM-on-NVIDIA-Turing-GPUs copied to clipboard

Metadata

Cannot build the project

event时间统计有问题

kernel3

为什么矩阵的索引式列主序的

← Metadata

Owner

Metadata

Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing-SGEMM-on-NVIDIA-Turing-GPUs copied to clipboard