TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

feat: Open source fp8_blockscale_gemm

Open lucifer1004 opened this issue 9 months ago • 13 comments

lucifer1004 avatar Mar 25 '25 16:03 lucifer1004

QQ any benchmark compared with DeepGEMM on Hopper and Blackwell? Thanks.

zhyncs avatar Mar 25 '25 18:03 zhyncs

/bot run

lucifer1004 avatar Mar 26 '25 00:03 lucifer1004

QQ any benchmark compared with DeepGEMM on Hopper and Blackwell? Thanks.

DeepGEMM only support Hopper WGMMA now, and on Blackwell we cannot directly use it.

June

juney-nvidia avatar Mar 26 '25 00:03 juney-nvidia

DeepGEMM only support Hopper WGMMA now, and on Blackwell we cannot directly use it.

Hi @juney-nvidia Thanks for the reply. Do you recommend to use DeepGEMM on Hopper?

zhyncs avatar Mar 26 '25 00:03 zhyncs

@nv-guomingz @tongyuantongyu to help review.

cc @jiahanc for vis on this Hopper related effort.

juney-nvidia avatar Mar 26 '25 00:03 juney-nvidia

DeepGEMM only support Hopper WGMMA now, and on Blackwell we cannot directly use it.

Hi @juney-nvidia Thanks for the reply. Do you recommend to use DeepGEMM on Hopper?

On Hopper, based on our evaluation, by average DeepGEMM can bring better performance, while for some scenarios it can be worse, that's why we keep both with open-sourcing this fp8_blockscale_gemm implementation.

June

juney-nvidia avatar Mar 26 '25 00:03 juney-nvidia

/bot run

lucifer1004 avatar Mar 26 '25 06:03 lucifer1004

PR_Github #534 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 07:03 niukuo

PR_Github #534 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #456 completed with status: 'FAILURE'

niukuo avatar Mar 26 '25 07:03 niukuo

Hi @lucifer1004 it seems that ci found this link error. [2025-03-26T07:23:18.594Z] /usr/bin/ld: /home/jenkins/agent/workspace/LLM/helpers/Build-x86_64/llm/examples/cpp/executor/../../../cpp/build/tensorrt_llm/libtensorrt_llm.so: undefined reference to `tensorrt_llm::kernels::fp8_blockscale_gemm::CutlassFp8BlockScaleGemmRunner<__nv_bfloat16, __nv_fp8_e4m3, __nv_bfloat16>::CutlassFp8BlockScaleGemmRunner()'

nv-guomingz avatar Mar 26 '25 08:03 nv-guomingz

/bot run

nv-guomingz avatar Mar 26 '25 14:03 nv-guomingz

PR_Github #597 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 14:03 niukuo

PR_Github #597 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #506 completed with status: 'FAILURE'

niukuo avatar Mar 26 '25 15:03 niukuo

/bot run

lucifer1004 avatar Mar 31 '25 14:03 lucifer1004

/bot run

lucifer1004 avatar Mar 31 '25 15:03 lucifer1004

PR_Github #792 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 31 '25 15:03 tensorrt-cicd

PR_Github #792 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #640 completed with status: 'SUCCESS'

tensorrt-cicd avatar Mar 31 '25 17:03 tensorrt-cicd

/bot run

lucifer1004 avatar Apr 02 '25 01:04 lucifer1004

PR_Github #951 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 02 '25 01:04 tensorrt-cicd

PR_Github #951 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #745 completed with status: 'SUCCESS'

tensorrt-cicd avatar Apr 02 '25 03:04 tensorrt-cicd

/bot reuse-pipeline

lucifer1004 avatar Apr 02 '25 03:04 lucifer1004

PR_Github #968 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd avatar Apr 02 '25 03:04 tensorrt-cicd

PR_Github #968 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #951 for commit a55f7d0

tensorrt-cicd avatar Apr 02 '25 04:04 tensorrt-cicd