Aurelien Chartier
Aurelien Chartier
Transformers is expanding input_ids during processing since 4.48: https://github.com/huggingface/transformers/pull/35534 Hence, it does not need to be done in TRT-LLM code anymore.
* Add common loop cleanup function * Remove checks for attention DP if nothing to queue * Remove extra return statements * Remove extra variables * Remove commented debug print
## Summary by CodeRabbit * **New Features** * Introduced environment-driven configuration for speculative decoding token acceptance, allowing operators to override and control the number of accepted tokens for optimization and...