TensorRT-LLM
TensorRT-LLM copied to clipboard
Feat: Support Linear block scale layout in FP4 quantization
- Support Linear (row major) block scale factor layout in FP4 quantize kernel. This layout is used for trtllm-gen MOE FP4 kernel.
- New Unit tests added to test the linear layout FP4 quantize kernel. Note that FP4 linear layout GEMM kernel is not supported yet. We should add FP4 GEMM when kernel is ready.
Need to update internal_cutlass_kernel libs.
Need to update internal_cutlass_kernel libs. @yibinl-nvidia is there mr for updating internal_cutlass_kernels?
Need to update internal_cutlass_kernel libs. @yibinl-nvidia is there mr for updating internal_cutlass_kernels?
Yes, I will post a MR soon. I am still familiarizing myself with the internal kernel change workflow, and need to check trtllm test can pass with the updated lib files.
/bot run
@mikeiovine could you re-approve this PR? This is a mirror of the internal MR, with the minor changes on the internal_cutlass_kernel lib files. Thanks!
/bot kill
PR_Github #615 [ kill ] triggered by Bot
PR_Github #615 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit aa306bf
/bot run
PR_Github #618 [ run ] triggered by Bot
PR_Github #618 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #520 completed with status: 'FAILURE'
/bot run
Sorry for the delay! I've missed this in the move to Github. Looks good to me assuming there are only trivial changes compared to what I reviewed internally.
Sorry for the delay! I've missed this in the move to Github. Looks good to me assuming there are only trivial changes compared to what I reviewed internally.
Yes this a mirror of the change to the internal repo. The only difference is in the internal cutlass kernel directory, where the changes are bundled into lib files.
/bot run
PR_Github #806 [ run ] triggered by Bot
PR_Github #806 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #652 completed with status: 'FAILURE'
/bot run
PR_Github #923 [ run ] triggered by Bot
PR_Github #923 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #729 completed with status: 'FAILURE'
/bot run
PR_Github #935 [ run ] triggered by Bot
PR_Github #935 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #737 completed with status: 'SUCCESS'
Need to wait https://github.com/NVIDIA/TensorRT-LLM/pull/3071 to merge first
@yibinl-nvidia #3071 had been merged and please revolve conflicts in this PR.
/bot run
PR_Github #971 [ run ] triggered by Bot
PR_Github #971 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #756 completed with status: 'FAILURE'
/bot run
PR_Github #1034 [ run ] triggered by Bot