CUDA-GEMM-Optimization
CUDA-GEMM-Optimization copied to clipboard
Some questions about data accuracy ?
https://github.com/leimao/CUDA-GEMM-Optimization/blob/main/include/profile_utils.cuh#L238
Why is it better to use int here?
https://github.com/leimao/CUDA-GEMM-Optimization/blob/main/src/profile_cuda_gemm_fp16.cu#L14
Why is the accuracy used here so low? I see that the accuracy in apex and torch.test is relatively high.
https://pytorch.org/docs/stable/testing.html https://github.com/NVIDIA/apex/blob/master/tests/L0/run_fused_layer_norm/test_fused_layer_norm.py#L164
When I implemented gemm, I found that it was difficult to match accuracy with cublas. Why? 😢
@leimao