TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

feat: FP8 Rowwise quantization support for Cohere models

Open aikitoria opened this issue 9 months ago • 8 comments

This adds FP8 support for the LayerNorm kernel in the same way as was done for the RmsNorm kernel, which then allows us to use FP8 Rowwise quantization with the Cohere models.

For previous discussion, see https://github.com/NVIDIA/TensorRT-LLM/issues/2912

aikitoria avatar Mar 27 '25 15:03 aikitoria

/bot run

juney-nvidia avatar Mar 28 '25 00:03 juney-nvidia

@QiJune @ming-wei pls help review this MR.

juney-nvidia avatar Mar 28 '25 00:03 juney-nvidia

/bot run

juney-nvidia avatar Mar 28 '25 05:03 juney-nvidia

PR_Github #672 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 28 '25 06:03 tensorrt-cicd

PR_Github #672 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #565 completed with status: 'FAILURE'

tensorrt-cicd avatar Mar 28 '25 06:03 tensorrt-cicd

It looks like the CI failed, but the links go to some internal domains, so I can't see what the error is. I have some ideas what it might be... I probably need to update other usages of the LayerNorm quantization plugin to handle the new parameters.

aikitoria avatar Mar 28 '25 14:03 aikitoria

  • blossom-ci

@aikitoria you code failed to pass the pre-commit check.

Currently the pre-commit check failure will not be copied back to public to be viewable and we are working to improve it with this MR:

For now I just manually copy the error message to provide quick feedback: image

You can also refer here to do the pre-commit check locally in your own dev environment.

June

juney-nvidia avatar Mar 29 '25 12:03 juney-nvidia

Oh I see, I will fix the formatting for both PRs

aikitoria avatar Mar 29 '25 12:03 aikitoria

Thank you for the contribution!

I've left a few comments, but the PR looks overall good.

@juney-nvidia It'd be good if we can find someone familiar with quantization support. I personally don't have hands-on quantization experience, so I might miss something.

Sure, I just added @Tracin into the code reviewer loop.

Thanks June

juney-nvidia avatar Mar 31 '25 06:03 juney-nvidia

Hi @aikitoria , would you mind adding an functional unittest like tests/unittest/trt/quantization/test_smooth_quant_layer_norm.py? And it would be better to add an example usage in examples/commandr/README.md. Thanks.

wm2012011492 avatar Mar 31 '25 08:03 wm2012011492

@aikitoria any update on this?

ming-wei avatar Apr 08 '25 01:04 ming-wei

Sorry, I have been busy at work, I will come back to this this week!

Edit: Still haven't been able to get to it

aikitoria avatar Apr 08 '25 01:04 aikitoria

Closing as no updates from requester for +10 days. Feel free to reopen when you have some bandwidth to work on it!

poweiw avatar Jun 05 '25 20:06 poweiw