cuda-samples
cuda-samples copied to clipboard
Is there any sgemm example ( e.g. fp32) ?
Looking for a sgemm example. Any one knows where to find one ?
https://github.com/NVIDIA/cuda-samples/blob/master/Samples/4_CUDA_Libraries/simpleCUBLAS/simpleCUBLAS.cpp shows the use of CUBLAS sgemm.
Looks like it needs to be modified to get some metrics like the bf16TensorCoreGemm example.