fix: WeightOnlyQuantRowLinear
The WeightOnlyQuantRowLinear module was missing the is_expert parameter, which caused MoE models like Deepseek 2/3 and Mixtral to perform unnecessary allreduce operations during INT8 weight-only quantization. This issue resulted in incorrect outputs when running run.py on Deepseek V2.5 quantized by INT8 weight-only, as shown below:
Input [Text 0]: "<|begin▁of▁sentence|>Born in north-east France, Soyer trained as a"
Output [Text 0 Beam 0]: ",我们,的 ,3
,�
、 00精度100,在
00
"
This has been fixed, and the correct output is now generated as follows:
Input [Text 0]: "<|begin▁of▁sentence|>Born in north-east France, Soyer trained as a"
Output [Text 0 Beam 0]: " chef in Paris before moving to London in 1840. He became the first celebrity chef, writing several cookbooks and inventing the double"
Additionally, the summary results are now accurate:
' James Best, best known for his portrayal of Sheriff Rosco P. Coltrane on TV\'s "The Dukes of Hazzard," died at 88 after a brief illness. He was a busy actor for decades in theater and Hollywood, but didn\'t become famous until 1979 when "The Dukes of Hazzard" began airing. Best gave his character a childlike enthusiasm that made him endearing. The show ran until 1985'
INFO:TRT-LLM:[TRT-LLM] [I] rouge1 : 28.546239873545996
INFO:TRT-LLM:[TRT-LLM] [I] rouge2 : 9.14361660942447
INFO:TRT-LLM:[TRT-LLM] [I] rougeL : 20.7284256053234
INFO:TRT-LLM:[TRT-LLM] [I] rougeLsum : 23.15367596318552
PTAL, thanks!@yingcanw @kaiyux
@liquanfeng pls rebase this MR with the latest main branch. @Barry-Delaney pls help review this MR when it gets ready.
Thanks June
Rebase done. PTAL, thanks! @Barry-Delaney
/bot run
PR_Github #274 [ run ] triggered by Bot
PR_Github #274 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #264 completed with status: 'FAILURE'
/bot run
PR_Github #288 [ run ] triggered by Bot
PR_Github #288 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #277 completed with status: 'FAILURE'
/bot run
PR_Github #315 [ run ] triggered by Bot
PR_Github #315 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #297 completed with status: 'FAILURE'
@liquanfeng Hi, the pipeline is failed on yapf style check, can you please help fix that?
[2025-03-24T15:06:21.035Z] yapf.....................................................................Failed
[2025-03-24T15:06:21.035Z] - hook id: yapf
[2025-03-24T15:06:21.035Z] - files were modified by this hook
You can do the following to fix it:
pip install pre-commit
pre-commit install
pre-commit run -a
Please feel free to let us know for any questions, thanks!
Wow, it‘s a really long CI. Is there anything wrong? @kaiyux
/bot run
PR_Github #740 [ run ] triggered by Bot
Wow, it‘s a really long CI. Is there something wrong? @kaiyux
@liquanfeng Sorry, not sure why I was not notified that the branch has been updated, I just re-launched the pipeline.
PR_Github #740 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #608 completed with status: 'SUCCESS'
/bot reuse-pipeline
PR_Github #760 [ reuse-pipeline ] triggered by Bot
PR_Github #760 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #740 for commit 82f4975