OlivierDehaene
OlivierDehaene
No, sending an API request to check your token count is not something we want. This compute needs to happen client side.
> Alternatively one could also consider adding additional settings for truncation side. No. Server side truncation should be seen as a last resort. We will never offer enough flexibility on...
> This would allow downstream applications to better handle load and token counting of requests. How?
Same tokenizer client side either with WASM or something else.
I'm a bit confused. Do you want a wrapper or do you want the --no-tui option to exist?
> The last line of text_generation_launcher: Args seems not correct What do you mean? The logprob should always have a value. If it does not something is going wrong, hence...
If you set a temperature of 10e-4, why not simply use greedy decoding?
> I'm new on text generation tasks but I want to lower the "creativity" of the model and stick to stable outputs You should not use any temperature then and...
To complete what @Narsil just said, what you would usually do instead is adding a rate limiter on the client side to avoid overloading the server (for example, limit the...
https://github.com/huggingface/text-generation-inference/blob/main/clients/python/text_generation/client.py#L285