transformers icon indicating copy to clipboard operation
transformers copied to clipboard

default max value of max_new_token

Open Navanit-git opened this issue 1 year ago • 5 comments
trafficstars

Feature request

Is there a way we can have the max_new_token as the max value for every llm model.

Motivation

I wanted to fix the max_new_token to the max value earlier I used 1024 but its not giving me full answers. I tried to manually put the max token length of LLM model, but how much time will I have to change it.

Your contribution

I will try to raise PR if I saw anything to add.

Navanit-git avatar May 06 '24 05:05 Navanit-git

Hi @Navanit-git, thanks for raising an issue!

This is a question best placed in our forums. We try to reserve the github issues for feature requests and bug reports.

The way to configure how many tokens are generated, is by modifying the max_new_tokens parameter passed to the model.generate(...) call

To see how to configure the generation behaviour, please refer to the docs:

  • https://huggingface.co/docs/transformers/v4.40.2/en/generation_strategies#text-generation-strategies
  • https://huggingface.co/docs/transformers/v4.40.2/en/llm_tutorial#generation-with-llms
  • https://huggingface.co/docs/transformers/main_classes/text_generation

amyeroberts avatar May 07 '24 11:05 amyeroberts

Hi thank you for the response. I know about the max_length and max_new_tokens and have a answer regarding this too in forum

But nowhere its written than how to put max_length as model generation max tokens like suppose llama 2 has max token as 4096 and llama 3 has 8192. So, if its possible there would be a feature to get the max model tokens.

and apology for any confusions

Navanit-git avatar May 07 '24 15:05 Navanit-git

@Navanit-git Just to make sure I've understood, are you wanting the max number of tokens a model can accept as input i.e. the context window?

amyeroberts avatar May 07 '24 15:05 amyeroberts

not as an input but max number of tokens it can generate i.e, output currently the default is 20. So if possible we can have the max amount to so that when user set value -1, it can generate the tokens till the model max length

Navanit-git avatar May 07 '24 15:05 Navanit-git

not as an input but max number of tokens it can generate

They're related. When the model generates, it takes an input and predicts a probability across the vocabulary for the next token in the sequence. So the maximum number of tokens that can be generated is the maximum input length - prompt length.

So if possible we can have the max amount to so that when user set value -1, it can generate the tokens till the model max length

I see. I don't think this is directly possible: the generation logic is not within the model, so if it receives -1 it can't access the model's information know when to stop.

You should be able to do something like model.generate(**inputs, max_length=model.config.max_position_embeddings)

I don't think this will guarantee a sequence of the maximum token length either - the model may generate a EOS token before this is reached

cc @gante to confirm if this is right or not :)

amyeroberts avatar May 07 '24 18:05 amyeroberts

@Navanit-git 👋

The comment written by @amyeroberts above is correct.

The maximum length set to 20 by default is there for safety reasons: if you call generate with the maximum length, your computer will hang for a long time. It is a conscious choice on our end, we prefer to throw a warning suggesting to set max_new_tokens rather than allow generate to crash at the hands of beginners 🤗

gante avatar May 14 '24 14:05 gante

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jun 08 '24 08:06 github-actions[bot]