TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

fix: WeightOnlyQuantRowLinear

Open liquanfeng opened this issue 10 months ago • 13 comments

The WeightOnlyQuantRowLinear module was missing the is_expert parameter, which caused MoE models like Deepseek 2/3 and Mixtral to perform unnecessary allreduce operations during INT8 weight-only quantization. This issue resulted in incorrect outputs when running run.py on Deepseek V2.5 quantized by INT8 weight-only, as shown below:

Input [Text 0]: "<|begin▁of▁sentence|>Born in north-east France, Soyer trained as a"
Output [Text 0 Beam 0]: ",我们,的 ,3


,�
 、 00精度100,在
00
"

This has been fixed, and the correct output is now generated as follows:

Input [Text 0]: "<|begin▁of▁sentence|>Born in north-east France, Soyer trained as a"
Output [Text 0 Beam 0]: " chef in Paris before moving to London in 1840. He became the first celebrity chef, writing several cookbooks and inventing the double"

Additionally, the summary results are now accurate:

' James Best, best known for his portrayal of Sheriff Rosco P. Coltrane on TV\'s "The Dukes of Hazzard," died at 88 after a brief illness. He was a busy actor for decades in theater and Hollywood, but didn\'t become famous until 1979 when "The Dukes of Hazzard" began airing. Best gave his character a childlike enthusiasm that made him endearing. The show ran until 1985'
INFO:TRT-LLM:[TRT-LLM] [I]   rouge1 : 28.546239873545996
INFO:TRT-LLM:[TRT-LLM] [I]   rouge2 : 9.14361660942447
INFO:TRT-LLM:[TRT-LLM] [I]   rougeL : 20.7284256053234
INFO:TRT-LLM:[TRT-LLM] [I]   rougeLsum : 23.15367596318552

liquanfeng avatar Feb 09 '25 12:02 liquanfeng

PTAL, thanks!@yingcanw @kaiyux

liquanfeng avatar Feb 09 '25 12:02 liquanfeng

@liquanfeng pls rebase this MR with the latest main branch. @Barry-Delaney pls help review this MR when it gets ready.

Thanks June

juney-nvidia avatar Mar 24 '25 05:03 juney-nvidia

Rebase done. PTAL, thanks! @Barry-Delaney

liquanfeng avatar Mar 24 '25 08:03 liquanfeng

/bot run

kaiyux avatar Mar 24 '25 08:03 kaiyux

PR_Github #274 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 09:03 niukuo

PR_Github #274 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #264 completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 10:03 niukuo

/bot run

kaiyux avatar Mar 24 '25 10:03 kaiyux

PR_Github #288 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 10:03 niukuo

PR_Github #288 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #277 completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 12:03 niukuo

/bot run

kaiyux avatar Mar 24 '25 14:03 kaiyux

PR_Github #315 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 15:03 niukuo

PR_Github #315 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #297 completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 15:03 niukuo

@liquanfeng Hi, the pipeline is failed on yapf style check, can you please help fix that?

[2025-03-24T15:06:21.035Z] yapf.....................................................................Failed
[2025-03-24T15:06:21.035Z] - hook id: yapf
[2025-03-24T15:06:21.035Z] - files were modified by this hook

You can do the following to fix it:

pip install pre-commit
pre-commit install
pre-commit run -a

Please feel free to let us know for any questions, thanks!

kaiyux avatar Mar 24 '25 15:03 kaiyux

Wow, it‘s a really long CI. Is there anything wrong? @kaiyux

liquanfeng avatar Mar 31 '25 03:03 liquanfeng

/bot run

kaiyux avatar Mar 31 '25 05:03 kaiyux

PR_Github #740 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 31 '25 05:03 tensorrt-cicd

Wow, it‘s a really long CI. Is there something wrong? @kaiyux

@liquanfeng Sorry, not sure why I was not notified that the branch has been updated, I just re-launched the pipeline.

kaiyux avatar Mar 31 '25 05:03 kaiyux

PR_Github #740 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #608 completed with status: 'SUCCESS'

tensorrt-cicd avatar Mar 31 '25 07:03 tensorrt-cicd

/bot reuse-pipeline

kaiyux avatar Mar 31 '25 08:03 kaiyux

PR_Github #760 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd avatar Mar 31 '25 08:03 tensorrt-cicd

PR_Github #760 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #740 for commit 82f4975

tensorrt-cicd avatar Mar 31 '25 08:03 tensorrt-cicd