OpenBLAS Lack of FP16/BF16 Precision Support for GEMM Kernels on RISCV

Hello,

We are currently working on porting LLMs to RISCV for inference, and due to hardware limitations, we need to maintain FP16 or BF16 precision. In this process, we are using the widely-used and highly-regarded OpenBLAS as the underlying computational support. However, the computational speed didn't meet our expectations.

It seems that this issue is due to the lack of pure FP16 or mixed-precision GEMM kernels in OpenBLAS for the RISCV architecture. Although we have seen sbgemm kernels on other architectures, they are not available for RISCV.

We noticed that the RISCV instruction set manual released on May 8, 2025, already supports RVV 1.0 for mixed-precision FP16 and BF16 Instruction Set Manual, and the RVV intrinsics have also been updated for FP16 mixed-precision RVV Intrinsic Document. Based on the vectorized sgemm kernel, we have implemented a mixed-precision GEMM kernel for RISCV, which we believe should be shgemm, similar to the approach in #2767.

We would like to know if there is any plan to add support for FP16/BF16 precision GEMM kernels on the RISCV architecture in OpenBLAS. Specifically, we are interested in whether there are plans to support these kernels on the RISC-V 64-bit architecture with vlen of 128 bits (RISCV64_ZVL128B) and 256 bits (RISCV64_ZVL256B). If not, we are willing to contribute our implementation and provide the code.

May 22 '25 10:05 Srangrang

There is currently very little organizational structure or developtment planning behind OpenBLAS, but I am not aware of anybody currently working on BF16 kernels for RISCV. For FP16, I think it would be trivial to copy all the "infrastructure" bits (additions to various headers and makefiles, etc) from the early BF16 pull requests, and there was someone expressing interest in writing x86_64 kernels a few years ago but he switched jobs and that work stalled. So your contribution would be very welcome.

May 23 '25 12:05 martin-frbg

hello, I am very interested in the "mixed-precision GEMM kernel for RISCV" you mentioned in this issue,can you introduce more infomation？ Thanks a lot.

Jul 07 '25 05:07 LaNasilDark

@LaNasilDark hello, Mixed-precision GEMM on RISC-V typically uses FP16 for multiplication and FP32 for accumulation. Functions like __riscv_vfwmul_vf_f32m1 perform FP16 multiplication with FP32 output, and __riscv_vfwmacc_vf_f32m1 handles FP16 multiply-accumulate into FP32. For details, refer to the RISC-V vector extension specification.

Jul 09 '25 08:07 gkdddd

I think this issue can be closed. @martin-frbg ?

Nov 03 '25 23:11 ChipKerchner

OpenBLAS OpenBLAS copied to clipboard

Lack of FP16/BF16 Precision Support for GEMM Kernels on RISCV

OpenBLAS
OpenBLAS copied to clipboard