TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

feat: MLA FP8 KV Cache on Blackwell

Open DylanChen-NV opened this issue 9 months ago • 9 comments

Add support for fp8 kv cache on blackwell

DylanChen-NV avatar Mar 24 '25 03:03 DylanChen-NV

/bot run

DylanChen-NV avatar Mar 24 '25 03:03 DylanChen-NV

PR_Github #231 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 04:03 niukuo

PR_Github #231 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #233 completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 08:03 niukuo

/bot run --stage-list H100_PCIe-5,B200_PCIe-2

DylanChen-NV avatar Mar 24 '25 10:03 DylanChen-NV

PR_Github #286 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 10:03 niukuo

PR_Github #286 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #275 (Partly Tested) completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 11:03 niukuo

/bot run --stage-list H100_PCIe-5,B200_PCIe-2

DylanChen-NV avatar Mar 24 '25 11:03 DylanChen-NV

PR_Github #294 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 11:03 niukuo

PR_Github #294 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #283 (Partly Tested) completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 13:03 niukuo

This PR can be closed as it has already been merged in https://github.com/NVIDIA/TensorRT-LLM/pull/3190

DylanChen-NV avatar Apr 11 '25 02:04 DylanChen-NV