Aurelien Chartier

Results 5 issues of Aurelien Chartier

Transformers is expanding input_ids during processing since 4.48: https://github.com/huggingface/transformers/pull/35534 Hence, it does not need to be done in TRT-LLM code anymore.

* Add common loop cleanup function * Remove checks for attention DP if nothing to queue * Remove extra return statements * Remove extra variables * Remove commented debug print

## Summary by CodeRabbit * **New Features** * Introduced environment-driven configuration for speculative decoding token acceptance, allowing operators to override and control the number of accepted tokens for optimization and...