How to implement a gemm with FP16 and INT4 using kernel in FasterTransformer/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm

Open AkatsukiChiri opened this issue 1 year ago • 0 comments

I am trying to implement a GEMM with FP16 and INT4. I hope to call the fpA_intB_gemm_fp16_int4 kernel located in FasterTransformer/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm, but I see that the examples are all implementations for model inference. If I only want to reproduce the GEMM kernel, what should I do?

Jul 26 '24 17:07 AkatsukiChiri