[DRAFT] Introducing multi-vocab token sampling for audio generation
Multi-token support
Introduce multi-token sampling with autoregressive transformers to support audio generation. This is a draft PR to trigger pipelines for code quality check. Once issues fixed, the change is meant to go into https://github.com/rmittal-github/TensorRT-LLM/tree/release/0.19
The change originally based on https://gitlab-master.nvidia.com/ftp/tekit/-/merge_requests/8319, but rebased to work with v0.19.
/bot run
PR_Github #4016 [ run ] triggered by Bot
PR_Github #4016 [ run ] completed with state FAILURE
/LLM/release-0.19/L0_MergeRequest_PR pipeline #119 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #5547 [ run ] triggered by Bot
PR_Github #5547 [ run ] completed with state FAILURE
/LLM/release-0.19/L0_MergeRequest_PR pipeline #126 completed with status: 'FAILURE'
We do not accept any changes in the release branch. Please target main.
Closing since no updates from requester after https://github.com/NVIDIA/TensorRT-LLM/pull/4030#issuecomment-2889886525. Feel free to reopen!