gemma Maximum output length and the effect of other parameters in test

Maximum output length and the effect of other parameters in test

Open kiashann opened this issue 1 year ago • 1 comments

trafficstars

Hi, what is the maximum output_len and how can I read information related to all the parameters that I can change to get better results?

model.generate( USER_CHAT_TEMPLATE.format(prompt=prompt), device=device, output_len=1000, )

Mar 11 '24 17:03 kiashann

@kiashann , AFAIK, Depending on the model used, we can accomodate a certain amount of tokens shared between prompt and model's generation, thus forcing us to operate some splitting if the context is too large.

Gemma has a maximum context length of 8192 tokens, which roughly translates to more than 6100 words.

Also If you're working with the Gemma code directly, you can explore the GemmaConfig class, which defines the model's parameters https://huggingface.co/docs/transformers/en/model_doc/gemma

Thank you!

May 03 '24 10:05 tilakrayal

Hi @kiashann,

Could you please confirm if this issue is resolved for you with the above comment ? Please feel free to close the issue if it is resolved ?

Thank you.

Aug 28 '24 06:08 Gopi-Uppari

Closing this issue due to lack of recent activity, Please reopen if this is still a valid request. Thank you!

Oct 07 '24 05:10 tilakrayal

Does the Gemma 3 model has some similar issues like the following issue https://github.com/ollama/ollama/issues/9871? I can't seem to use more than 8192 tokens. This seems to be the default for Gemma, but shouldn't be the case for Gemma 3, as it supports 32K on the 1B vs 128K on the other models?

Sep 13 '25 17:09 radna0

gemma gemma copied to clipboard

Maximum output length and the effect of other parameters in test

gemma
gemma copied to clipboard