Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing-SGEMM-on-NVIDIA-Turing-GPUs copied to clipboard
event时间统计有问题
for (n_count = 0; n_count < N; n_count++) {
cudaEventRecord(beg);
test_kernel(kernel_num, m, n, k, alpha, dA, dB, beta, dC, err);
cudaEventRecord(end);
cudaEventSynchronize(beg);
cudaEventSynchronize(end);
cudaEventElapsedTime(&ms, beg, end);
elapsed_time += ms;
}
这种方式统计时间更准确