tensorrtllm_backend
tensorrtllm_backend copied to clipboard
`min_length` parameter doesn't work
System Info
8 x 40 GB A100s Llama-3 70B Instruct, bf16 TP-8 TensorRT-LLM 0.9.0 + Triton 24.04
Who can help?
@byshiue @schetlur-nv
Information
- [x] The official example scripts
- [ ] My own modified scripts
Tasks
- [x] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Set min_length to a high value (~512) and ask for a short answer in the prompt.`
Expected behavior
512 tokens returned.
actual behavior
Few tokens returned.
additional notes
N/A