text-generation-inference [FEATURE] automatically set `max_new

Hi @OlivierDehaene and @Narsil,

Feature request

Would it be possible to automatically set max_new_tokens to its maximum value. E.g. max_new_tokens=-1 or max_new_tokens=None would be replaced by max_new_tokens=max_total_tokens - len(tokenizer(input_string).input_ids).

Motivation

Right now I'm setting max_new_tokens=max_total_tokens - len(tokenizer(input_string).input_ids) which has the following downsides:

we tokenized twice the input: once to find the desired TGI argument, and once in TGI
we need to know and have access to the path of the tokenizer on the client side. This is annoying when the tokenizer is on the machine where TGI server is running.

It also seems that there are many issues where people hit the max_length limit, it would be very nice (and easy) to automatically select max_new_tokens instead of failing. In any case you can know whether the output was truncated and deal with that if needed.

Your contribution

I don't know any rust but can help if you show me where to.

Thanks for the great library 💯

Aug 17 '23 17:08 YannDubs

Thanks for the kind words.

Asking for max_new_tokens all the time will mean the router will consider a lot of tokens for that particular query, meaning it will less be able to stack users.

That being said, it's something that could be nice indeed.

Aug 17 '23 18:08 Narsil

Makes sense, then maybe a better way would be to just support max_length like in HuggingFace GenerationConfig so that we can get a fixed total length without it having to always be the maximum. That would also make TGI more similar to .generate

Aug 21 '23 06:08 YannDubs

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Apr 04 '24 01:04 github-actions[bot]

[FEATURE] automatically set `max_new_tokens` to its maximum value

Feature request

Motivation

Your contribution