TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

feat: Variable-Beam-Width-Search (VBWS) part1

Open wili-65535 opened this issue 9 months ago • 5 comments

Background

In current TRT-LLM, we regard the beam_width of the runtime as a scalar, which means:

  1. The same beam_width must be used for a request along all generation steps (time axis).
  2. The same beam_width must be used for requests batched together (space axis).

Final target

  • Loosening the constrains above as:
  1. Each request owns a beam_width_array for beam search. For example, --beam_width_array=[20,40,60] means using beam_width=20 for the first step, 40 for the second step, 60 for all following steps (we call it Variable-Beam-Width-Search, VBWS).
  2. Requests with different beam width can be batched together for generation (we call it Diverse-Beam-Width-Search, DBWS).

Target of this PR

We plan to implement the final target in 4 PRs, and this PR is the first part, where we achieve:

  1. Add member beamWidthArray and related methods for class SamplingConfig.
  2. Rewrite C++/Python unit tests for class SamplingConfig.
    • CPP unit test: cpp/tests/executor/SamplingConfigTest.cpp, cpp/tests/runtime/SamplingConfigTest.cpp.
    • Python unit test: tests/unittest/api_stability/test_llm_api.py, tests/bindings/test_bind‎ings_ut.py‎.

wili-65535 avatar Mar 26 '25 01:03 wili-65535

/bot run

wili-65535 avatar Mar 26 '25 01:03 wili-65535

PR_Github #481 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 01:03 niukuo

PR_Github #481 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #414 completed with status: 'FAILURE'

niukuo avatar Mar 26 '25 01:03 niukuo

/bot run

wili-65535 avatar Mar 26 '25 01:03 wili-65535

PR_Github #487 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 01:03 niukuo

PR_Github #487 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #419 completed with status: 'FAILURE'

niukuo avatar Mar 26 '25 03:03 niukuo

/bot run

wili-65535 avatar Mar 26 '25 05:03 wili-65535

PR_Github #519 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 05:03 niukuo

PR_Github #519 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #444 completed with status: 'SUCCESS'

niukuo avatar Mar 26 '25 08:03 niukuo

/bot reuse-pipeline

byshiue avatar Mar 26 '25 08:03 byshiue

PR_Github #551 [ reuse-pipeline ] triggered by Bot

niukuo avatar Mar 26 '25 08:03 niukuo

PR_Github #551 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #519 for commit 21d41d3

niukuo avatar Mar 26 '25 09:03 niukuo

/bot reuse-pipeline "the last commit only update the author message"

byshiue avatar Mar 26 '25 09:03 byshiue

PR_Github #563 Bot args parsing error!

niukuo avatar Mar 26 '25 09:03 niukuo

/bot reuse-pipeline --comment "the last commit only update the author message"

byshiue avatar Mar 26 '25 09:03 byshiue

/bot run

wili-65535 avatar Mar 26 '25 10:03 wili-65535

PR_Github #572 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 10:03 niukuo

PR_Github #572 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #486 completed with status: 'FAILURE'

niukuo avatar Mar 26 '25 12:03 niukuo

/bot run

wili-65535 avatar Mar 26 '25 12:03 wili-65535

PR_Github #588 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 12:03 niukuo

PR_Github #588 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #500 completed with status: 'SUCCESS'

niukuo avatar Mar 26 '25 15:03 niukuo

/bot reuse-pipeline

Funatiq avatar Mar 26 '25 15:03 Funatiq

PR_Github #600 [ reuse-pipeline ] triggered by Bot

niukuo avatar Mar 26 '25 15:03 niukuo

PR_Github #600 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #588 for commit 88b9bd8

niukuo avatar Mar 26 '25 15:03 niukuo