text-generation-inference
text-generation-inference copied to clipboard
Remove max_stop_sequences by default
System Info
https://github.com/huggingface/text-generation-inference/blob/1028996fb380f07ebb2a9de1d2795e176f845c59/launcher/src/main.rs#L427-L428
I think it would be best for this limit to be non-existent by default, rather than 4. Or at least something higher like 16. Though client applications are able to detect that TGI is being used and encode the limit, it causes poorer behavior in autocomplete scenarios where >4 stop words are actually necessary. Yes, users of TGI can technically change this value, but many do not know to do this and will have lower-quality first experiences with whatever tools they are using.
All this said, I appreciate that this limit was initially defined by OpenAI and following along with their patterns is something that I in general support.
For additional background: https://github.com/continuedev/continue/issues/2380
Information
- [X] Docker
- [X] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
Send a request with >4 stop tokens
Expected behavior
Ideally a request with >4 stop tokens does not throw an error, unless the person running TGI has explicitly set a limit