TensorRT-LLM feat: Open source fp8_blockscale

Mar 25 '25 16:03 lucifer1004

QQ any benchmark compared with DeepGEMM on Hopper and Blackwell? Thanks.

Mar 25 '25 18:03 zhyncs

/bot run

Mar 26 '25 00:03 lucifer1004

QQ any benchmark compared with DeepGEMM on Hopper and Blackwell? Thanks.

DeepGEMM only support Hopper WGMMA now, and on Blackwell we cannot directly use it.

June

Mar 26 '25 00:03 juney-nvidia

DeepGEMM only support Hopper WGMMA now, and on Blackwell we cannot directly use it.

Hi @juney-nvidia Thanks for the reply. Do you recommend to use DeepGEMM on Hopper?

Mar 26 '25 00:03 zhyncs

@nv-guomingz @tongyuantongyu to help review.

cc @jiahanc for vis on this Hopper related effort.

Mar 26 '25 00:03 juney-nvidia

DeepGEMM only support Hopper WGMMA now, and on Blackwell we cannot directly use it.

Hi @juney-nvidia Thanks for the reply. Do you recommend to use DeepGEMM on Hopper?

On Hopper, based on our evaluation, by average DeepGEMM can bring better performance, while for some scenarios it can be worse, that's why we keep both with open-sourcing this fp8_blockscale_gemm implementation.

June

Mar 26 '25 00:03 juney-nvidia

/bot run

Mar 26 '25 06:03 lucifer1004

PR_Github #534 [ run ] triggered by Bot

Mar 26 '25 07:03 niukuo

PR_Github #534 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #456 completed with status: 'FAILURE'

Mar 26 '25 07:03 niukuo

Hi @lucifer1004 it seems that ci found this link error. [2025-03-26T07:23:18.594Z] /usr/bin/ld: /home/jenkins/agent/workspace/LLM/helpers/Build-x86_64/llm/examples/cpp/executor/../../../cpp/build/tensorrt_llm/libtensorrt_llm.so: undefined reference to `tensorrt_llm::kernels::fp8_blockscale_gemm::CutlassFp8BlockScaleGemmRunner<__nv_bfloat16, __nv_fp8_e4m3, __nv_bfloat16>::CutlassFp8BlockScaleGemmRunner()'

Mar 26 '25 08:03 nv-guomingz

/bot run

Mar 26 '25 14:03 nv-guomingz

PR_Github #597 [ run ] triggered by Bot

Mar 26 '25 14:03 niukuo

PR_Github #597 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #506 completed with status: 'FAILURE'

Mar 26 '25 15:03 niukuo

/bot run

Mar 31 '25 14:03 lucifer1004

/bot run

Mar 31 '25 15:03 lucifer1004

PR_Github #792 [ run ] triggered by Bot

Mar 31 '25 15:03 tensorrt-cicd

PR_Github #792 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #640 completed with status: 'SUCCESS'

Mar 31 '25 17:03 tensorrt-cicd

/bot run

Apr 02 '25 01:04 lucifer1004

PR_Github #951 [ run ] triggered by Bot

Apr 02 '25 01:04 tensorrt-cicd

PR_Github #951 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #745 completed with status: 'SUCCESS'

Apr 02 '25 03:04 tensorrt-cicd

/bot reuse-pipeline

Apr 02 '25 03:04 lucifer1004

PR_Github #968 [ reuse-pipeline ] triggered by Bot

Apr 02 '25 03:04 tensorrt-cicd

PR_Github #968 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #951 for commit a55f7d0

Apr 02 '25 04:04 tensorrt-cicd

feat: Open source fp8_blockscale_gemm