feat: Variable-Beam-Width-Search (VBWS) part1
Background
In current TRT-LLM, we regard the beam_width of the runtime as a scalar, which means:
- The
same beam_widthmust be used for a request along all generation steps (time axis). - The
same beam_widthmust be used for requests batched together (space axis).
Final target
- Loosening the constrains above as:
- Each request owns a
beam_width_arrayfor beam search. For example,--beam_width_array=[20,40,60]means usingbeam_width=20for the first step, 40 for the second step, 60 for all following steps (we call it Variable-Beam-Width-Search, VBWS). - Requests with different beam width can be batched together for generation (we call it Diverse-Beam-Width-Search, DBWS).
Target of this PR
We plan to implement the final target in 4 PRs, and this PR is the first part, where we achieve:
- Add member
beamWidthArrayand related methods for classSamplingConfig. - Rewrite C++/Python unit tests for class
SamplingConfig.- CPP unit test:
cpp/tests/executor/SamplingConfigTest.cpp,cpp/tests/runtime/SamplingConfigTest.cpp. - Python unit test:
tests/unittest/api_stability/test_llm_api.py,tests/bindings/test_bindings_ut.py.
- CPP unit test:
/bot run
PR_Github #481 [ run ] triggered by Bot
PR_Github #481 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #414 completed with status: 'FAILURE'
/bot run
PR_Github #487 [ run ] triggered by Bot
PR_Github #487 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #419 completed with status: 'FAILURE'
/bot run
PR_Github #519 [ run ] triggered by Bot
PR_Github #519 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #444 completed with status: 'SUCCESS'
/bot reuse-pipeline
PR_Github #551 [ reuse-pipeline ] triggered by Bot
PR_Github #551 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #519 for commit 21d41d3
/bot reuse-pipeline "the last commit only update the author message"
PR_Github #563 Bot args parsing error!
/bot reuse-pipeline --comment "the last commit only update the author message"
/bot run
PR_Github #572 [ run ] triggered by Bot
PR_Github #572 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #486 completed with status: 'FAILURE'
/bot run
PR_Github #588 [ run ] triggered by Bot
PR_Github #588 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #500 completed with status: 'SUCCESS'
/bot reuse-pipeline
PR_Github #600 [ reuse-pipeline ] triggered by Bot
PR_Github #600 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #588 for commit 88b9bd8