hipBLAS compute type of hipblasGemmEx

Running a CUDA program shows that cublasGemmEx supports compute type CUBLAS_COMPUTE_32F_FAST_TF32 and CUBLAS_GEMM_DEFAULT_TENSOR_OP. The type is not available in hipBLAS. Thank you for your discussion.

status = cublasGemmEx(handle, CUBLAS_OP_N, CUBLAS_OP_N, B_cols, A_rows, A_cols, &alpha, gpu_B, CUDA_R_32F, A_rows, gpu_A, CUDA_R_32F,
                      A_cols, &beta, gpu_C, CUDA_R_32F, A_rows, CUBLAS_COMPUTE_32F_FAST_TF32, CUBLAS_GEMM_DEFAULT_TENSOR_OP);

Dec 21 '23 22:12 jinz2014

Hi @jinz2014,

hipblasComputeType_t was added for the ROCm 6.0.0 release, and includes HIPBLAS_COMPUTE_32F_FAST_TF32 which is the equivalent of CUBLAS_COMPUTE_32F_FAST_TF32. Note that rocBLAS does not have an equivalent computeType so will return HIPBLAS_STATUS_NOT_SUPPORTED if used with the rocBLAS backend. Note that hipblasComputeType_t is only used with the HIPBLAS_V2 API for now while the old interface for hipblasGemmEx is deprecated. You can take a look at some documentation on the HIPBLAS_V2 API, along with hipblasGemmEx documentation. At some point in the future the hipblasComputeType_t version of hipblasGemmEx will be standard.

CUBLAS_GEMM_DEFAULT_TENSOR_OP is deprecated in cuBLAS, I will discuss with the team if we believe it should be added to the library.

Thanks, Daine

Dec 22 '23 20:12 daineAMD

Hi again @jinz2014,

I hope the HIPBLAS_V2 API was able to satisfy your needs regarding the compute types for gemmEx. Again, this will be the default behaviour of hipBLAS in the future, but for now it lives within the HIPBLAS_V2 API.

If you have any further questions feel free to re-open this issue or open another.

Thanks, Daine

Jun 13 '24 21:06 daineAMD