NVIDIA_SGEMM_PRACTICE
NVIDIA_SGEMM_PRACTICE copied to clipboard
Results differ from cublas
Dear @wangzyon , when set the size of the matrix to 3, or 9; the mm results are significantly different from cublas. How can we solve it?
Reulsts of kernel 1-5 differ from cublas. I know the reason.
While, result of kenerl 6-7 is tested be same as cubals, however it is different from that of NumPy (in Python). Why?
I generate a 16x16 matrix A, B, and compute it with kernel 7, and numpy.


The following figure is the results calculated through kernel 7.

It is significantly different from NumPy.
