feat: Open source fp8_blockscale_gemm
QQ any benchmark compared with DeepGEMM on Hopper and Blackwell? Thanks.
/bot run
QQ any benchmark compared with DeepGEMM on Hopper and Blackwell? Thanks.
DeepGEMM only support Hopper WGMMA now, and on Blackwell we cannot directly use it.
June
DeepGEMM only support Hopper WGMMA now, and on Blackwell we cannot directly use it.
Hi @juney-nvidia Thanks for the reply. Do you recommend to use DeepGEMM on Hopper?
@nv-guomingz @tongyuantongyu to help review.
cc @jiahanc for vis on this Hopper related effort.
DeepGEMM only support Hopper WGMMA now, and on Blackwell we cannot directly use it.
Hi @juney-nvidia Thanks for the reply. Do you recommend to use DeepGEMM on Hopper?
On Hopper, based on our evaluation, by average DeepGEMM can bring better performance, while for some scenarios it can be worse, that's why we keep both with open-sourcing this fp8_blockscale_gemm implementation.
June
/bot run
PR_Github #534 [ run ] triggered by Bot
PR_Github #534 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #456 completed with status: 'FAILURE'
Hi @lucifer1004 it seems that ci found this link error. [2025-03-26T07:23:18.594Z] /usr/bin/ld: /home/jenkins/agent/workspace/LLM/helpers/Build-x86_64/llm/examples/cpp/executor/../../../cpp/build/tensorrt_llm/libtensorrt_llm.so: undefined reference to `tensorrt_llm::kernels::fp8_blockscale_gemm::CutlassFp8BlockScaleGemmRunner<__nv_bfloat16, __nv_fp8_e4m3, __nv_bfloat16>::CutlassFp8BlockScaleGemmRunner()'
/bot run
PR_Github #597 [ run ] triggered by Bot
PR_Github #597 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #506 completed with status: 'FAILURE'
/bot run
/bot run
PR_Github #792 [ run ] triggered by Bot
PR_Github #792 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #640 completed with status: 'SUCCESS'
/bot run
PR_Github #951 [ run ] triggered by Bot
PR_Github #951 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #745 completed with status: 'SUCCESS'
/bot reuse-pipeline
PR_Github #968 [ reuse-pipeline ] triggered by Bot
PR_Github #968 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #951 for commit a55f7d0