TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

[perf] Reduce the workspace size of FP4 activation scales for MoE

Open jinyangyuan-nvidia opened this issue 7 months ago • 38 comments

The first two dimensions of the original FP4 activation scales are merged to remove unnecessary storage space. Appropriate paddings are added when merging these two dimensions in consideration of the alignment requirements of TMA.

jinyangyuan-nvidia avatar May 14 '25 15:05 jinyangyuan-nvidia

/bot run

jinyangyuan-nvidia avatar May 14 '25 15:05 jinyangyuan-nvidia

/bot run

jinyangyuan-nvidia avatar May 14 '25 15:05 jinyangyuan-nvidia

PR_Github #5194 [ run ] triggered by Bot

tensorrt-cicd avatar May 14 '25 15:05 tensorrt-cicd

PR_Github #5194 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #3791 completed with status: 'FAILURE'

tensorrt-cicd avatar May 14 '25 15:05 tensorrt-cicd

/bot run

jinyangyuan-nvidia avatar May 15 '25 02:05 jinyangyuan-nvidia

PR_Github #5239 [ run ] triggered by Bot

tensorrt-cicd avatar May 15 '25 02:05 tensorrt-cicd

PR_Github #5239 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3827 completed with status: 'FAILURE'

tensorrt-cicd avatar May 15 '25 05:05 tensorrt-cicd

/bot run

jinyangyuan-nvidia avatar May 15 '25 05:05 jinyangyuan-nvidia

/bot run

jinyangyuan-nvidia avatar May 15 '25 05:05 jinyangyuan-nvidia

PR_Github #5271 [ run ] triggered by Bot

tensorrt-cicd avatar May 15 '25 05:05 tensorrt-cicd

PR_Github #5271 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #3851 completed with status: 'FAILURE'

tensorrt-cicd avatar May 15 '25 07:05 tensorrt-cicd

/bot run

jinyangyuan-nvidia avatar May 15 '25 07:05 jinyangyuan-nvidia

PR_Github #5303 [ run ] triggered by Bot

tensorrt-cicd avatar May 15 '25 08:05 tensorrt-cicd

PR_Github #5303 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #3872 completed with status: 'FAILURE'

tensorrt-cicd avatar May 15 '25 12:05 tensorrt-cicd

/bot run

jinyangyuan-nvidia avatar May 15 '25 16:05 jinyangyuan-nvidia

PR_Github #5377 [ run ] triggered by Bot

tensorrt-cicd avatar May 15 '25 16:05 tensorrt-cicd

PR_Github #5377 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #3924 completed with status: 'FAILURE'

tensorrt-cicd avatar May 15 '25 17:05 tensorrt-cicd

/bot run

jinyangyuan-nvidia avatar May 16 '25 03:05 jinyangyuan-nvidia

PR_Github #5445 [ run ] triggered by Bot

tensorrt-cicd avatar May 16 '25 03:05 tensorrt-cicd

PR_Github #5445 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3975 completed with status: 'FAILURE'

tensorrt-cicd avatar May 16 '25 06:05 tensorrt-cicd

/bot run

jinyangyuan-nvidia avatar May 16 '25 14:05 jinyangyuan-nvidia

PR_Github #5518 [ run ] triggered by Bot

tensorrt-cicd avatar May 16 '25 14:05 tensorrt-cicd

PR_Github #5518 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #4021 completed with status: 'FAILURE'

tensorrt-cicd avatar May 16 '25 15:05 tensorrt-cicd

/bot run

jinyangyuan-nvidia avatar May 16 '25 16:05 jinyangyuan-nvidia

PR_Github #5527 [ run ] triggered by Bot

tensorrt-cicd avatar May 16 '25 16:05 tensorrt-cicd

PR_Github #5527 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #4028 completed with status: 'FAILURE'

tensorrt-cicd avatar May 16 '25 18:05 tensorrt-cicd

/bot run

jinyangyuan-nvidia avatar May 17 '25 03:05 jinyangyuan-nvidia

/bot kill

jinyangyuan-nvidia avatar May 17 '25 03:05 jinyangyuan-nvidia

PR_Github #5554 [ run ] triggered by Bot

tensorrt-cicd avatar May 17 '25 03:05 tensorrt-cicd

/bot run

jinyangyuan-nvidia avatar May 17 '25 03:05 jinyangyuan-nvidia