feat: Add pp support for hybrid attn/mamba model
/bot run --disable-fail-fast --add-multi-gpu-test
PR_Github #5461 [ run ] triggered by Bot
PR_Github #5461 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3984 completed with status: 'SUCCESS'
Thanks @yuxianq , this looks great.
It would be great, however, to see if quality changes with this feature. We have another PR #4147 with this test.
@suyoggupta
could you please add a brief PR description?
Added copilot to also review this PR
@vegaluisjose I have cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4147 to this PR and the new test can pass locally.
could you please add a brief PR description?
@suyoggupta Added.
/bot reuse-pipeline
PR_Github #5681 [ reuse-pipeline ] triggered by Bot
PR_Github #5681 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #5461 for commit ba33051