exllama
exllama copied to clipboard
Feature Request: length_penalty support
We are trying to port the transformer based gen code to exllama but did not find a configurable length_penalty control. Will this be on the road map? Thanks.
Could you elaborate? There are various more-or-less hacky ways to force shorter or longer replies from a language model, but no standard way of doing it. Is there a particular front-end or UI you're referring to?
After looking at the transformer length_penalty doc it is actually beam_alpha. So, it is only applicable to multiple beams.
https://github.com/huggingface/transformers/issues/16930
being able to bias towards shorter or longer responses would be a great addition