TensorRT-LLM feat: FP8 Rowwise quantization support for Cohere models

This adds FP8 support for the LayerNorm kernel in the same way as was done for the RmsNorm kernel, which then allows us to use FP8 Rowwise quantization with the Cohere models.

For previous discussion, see https://github.com/NVIDIA/TensorRT-LLM/issues/2912

Mar 27 '25 15:03 aikitoria

/bot run

Mar 28 '25 00:03 juney-nvidia

@QiJune @ming-wei pls help review this MR.

Mar 28 '25 00:03 juney-nvidia

/bot run

Mar 28 '25 05:03 juney-nvidia

PR_Github #672 [ run ] triggered by Bot

Mar 28 '25 06:03 tensorrt-cicd

PR_Github #672 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #565 completed with status: 'FAILURE'

Mar 28 '25 06:03 tensorrt-cicd

It looks like the CI failed, but the links go to some internal domains, so I can't see what the error is. I have some ideas what it might be... I probably need to update other usages of the LayerNorm quantization plugin to handle the new parameters.

Mar 28 '25 14:03 aikitoria

blossom-ci

@aikitoria you code failed to pass the pre-commit check.

Currently the pre-commit check failure will not be copied back to public to be viewable and we are working to improve it with this MR:

For now I just manually copy the error message to provide quick feedback:

You can also refer here to do the pre-commit check locally in your own dev environment.

June

Mar 29 '25 12:03 juney-nvidia

Oh I see, I will fix the formatting for both PRs

Mar 29 '25 12:03 aikitoria

Thank you for the contribution!

I've left a few comments, but the PR looks overall good.

@juney-nvidia It'd be good if we can find someone familiar with quantization support. I personally don't have hands-on quantization experience, so I might miss something.

Sure, I just added @Tracin into the code reviewer loop.

Thanks June

Mar 31 '25 06:03 juney-nvidia

Hi @aikitoria , would you mind adding an functional unittest like tests/unittest/trt/quantization/test_smooth_quant_layer_norm.py? And it would be better to add an example usage in examples/commandr/README.md. Thanks.

Mar 31 '25 08:03 wm2012011492

@aikitoria any update on this?

Apr 08 '25 01:04 ming-wei

Sorry, I have been busy at work, I will come back to this this week!

Edit: Still haven't been able to get to it

Apr 08 '25 01:04 aikitoria

Closing as no updates from requester for +10 days. Feel free to reopen when you have some bandwidth to work on it!

Jun 05 '25 20:06 poweiw