TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Feat: Support Linear block scale layout in FP4 quantization

Open yibinl-nvidia opened this issue 8 months ago • 11 comments

  • Support Linear (row major) block scale factor layout in FP4 quantize kernel. This layout is used for trtllm-gen MOE FP4 kernel.
  • New Unit tests added to test the linear layout FP4 quantize kernel. Note that FP4 linear layout GEMM kernel is not supported yet. We should add FP4 GEMM when kernel is ready.

yibinl-nvidia avatar Mar 24 '25 22:03 yibinl-nvidia

Need to update internal_cutlass_kernel libs.

yibinl-nvidia avatar Mar 25 '25 04:03 yibinl-nvidia

Need to update internal_cutlass_kernel libs. @yibinl-nvidia is there mr for updating internal_cutlass_kernels?

nv-guomingz avatar Mar 25 '25 09:03 nv-guomingz

Need to update internal_cutlass_kernel libs. @yibinl-nvidia is there mr for updating internal_cutlass_kernels?

Yes, I will post a MR soon. I am still familiarizing myself with the internal kernel change workflow, and need to check trtllm test can pass with the updated lib files.

yibinl-nvidia avatar Mar 25 '25 16:03 yibinl-nvidia

/bot run

yibinl-nvidia avatar Mar 26 '25 21:03 yibinl-nvidia

@mikeiovine could you re-approve this PR? This is a mirror of the internal MR, with the minor changes on the internal_cutlass_kernel lib files. Thanks!

yibinl-nvidia avatar Mar 26 '25 21:03 yibinl-nvidia

/bot kill

yibinl-nvidia avatar Mar 26 '25 21:03 yibinl-nvidia

PR_Github #615 [ kill ] triggered by Bot

tensorrt-cicd avatar Mar 26 '25 21:03 tensorrt-cicd

PR_Github #615 [ kill ] completed with state SUCCESS Successfully killed previous jobs for commit aa306bf

tensorrt-cicd avatar Mar 26 '25 21:03 tensorrt-cicd

/bot run

yibinl-nvidia avatar Mar 26 '25 22:03 yibinl-nvidia

PR_Github #618 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 26 '25 22:03 tensorrt-cicd

PR_Github #618 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #520 completed with status: 'FAILURE'

tensorrt-cicd avatar Mar 26 '25 23:03 tensorrt-cicd

/bot run

yibinl-nvidia avatar Mar 31 '25 06:03 yibinl-nvidia

Sorry for the delay! I've missed this in the move to Github. Looks good to me assuming there are only trivial changes compared to what I reviewed internally.

mikeiovine avatar Mar 31 '25 13:03 mikeiovine

Sorry for the delay! I've missed this in the move to Github. Looks good to me assuming there are only trivial changes compared to what I reviewed internally.

Yes this a mirror of the change to the internal repo. The only difference is in the internal cutlass kernel directory, where the changes are bundled into lib files.

yibinl-nvidia avatar Mar 31 '25 17:03 yibinl-nvidia

/bot run

yibinl-nvidia avatar Mar 31 '25 23:03 yibinl-nvidia

PR_Github #806 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 31 '25 23:03 tensorrt-cicd

PR_Github #806 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #652 completed with status: 'FAILURE'

tensorrt-cicd avatar Apr 01 '25 00:04 tensorrt-cicd

/bot run

yibinl-nvidia avatar Apr 01 '25 22:04 yibinl-nvidia

PR_Github #923 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 01 '25 22:04 tensorrt-cicd

PR_Github #923 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #729 completed with status: 'FAILURE'

tensorrt-cicd avatar Apr 01 '25 22:04 tensorrt-cicd

/bot run

yibinl-nvidia avatar Apr 01 '25 23:04 yibinl-nvidia

PR_Github #935 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 01 '25 23:04 tensorrt-cicd

PR_Github #935 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #737 completed with status: 'SUCCESS'

tensorrt-cicd avatar Apr 02 '25 02:04 tensorrt-cicd

Need to wait https://github.com/NVIDIA/TensorRT-LLM/pull/3071 to merge first

yibinl-nvidia avatar Apr 02 '25 02:04 yibinl-nvidia

@yibinl-nvidia #3071 had been merged and please revolve conflicts in this PR.

nv-guomingz avatar Apr 02 '25 04:04 nv-guomingz

/bot run

yibinl-nvidia avatar Apr 02 '25 05:04 yibinl-nvidia

PR_Github #971 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 02 '25 05:04 tensorrt-cicd

PR_Github #971 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #756 completed with status: 'FAILURE'

tensorrt-cicd avatar Apr 02 '25 14:04 tensorrt-cicd

/bot run

yibinl-nvidia avatar Apr 02 '25 16:04 yibinl-nvidia

PR_Github #1034 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 02 '25 16:04 tensorrt-cicd