cookbook Bug Report: The model frequently generates repetitive token sequences.

Description of the bug:

No response

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

Dec 18 '24 18:12 Razaghallu786

Bug Report: Repetitive Token Generation in "gemini-1.5-flash" Model

Description of the Bug: When generating long texts using the "gemini-1.5-flash" model, repetitive token sequences frequently occur, resulting in infinite loops and exhausting the token limit. This behavior is consistent across both the Vertex and Gemini APIs.

Example:

"The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be

Steps to Reproduce:

Use the "gemini-1.5-flash" model via Vertex or Gemini API.
Generate a long text (e.g., a legal or technical document).
Observe the generated output for repeated phrases or sentences.

Expected Behavior: The model should produce coherent, non-repetitive text.

Actual Behavior: The model enters a repetitive loop, generating the same token sequences indefinitely until the token limit is reached.

Impact:

Resource Waste: Tokens are wasted, increasing costs and exhausting API usage limits.

Output Quality: The generated text becomes unusable, requiring additional API requests.

Reproduction Rate: Occurs frequently when generating long-form text.

Workaround: There is currently no known workaround to prevent this issue.

Request for Resolution:

Investigate and resolve the cause of repetitive token generation.
Implement a mechanism to detect and avoid repetitive loops during generation.
Consider offering refunds or credits for tokens wasted due to this bug.

Actual vs. Expected Behavior: Actual Output:

"The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed..."

Expected Output: "The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly."

Dec 18 '24 18:12 Razaghallu786

Dec 18 '24 20:12 Vitalina12512

Hi @Razaghallu786,

Could you please provide a bit more clarification on this? Is this happening with some features like function calling or structured output or just simply running the above prompt??

Dec 19 '24 05:12 gmKeshari

Which temperature are you using? If you are using 0, can you try with a higher one?

Dec 21 '24 22:12 Giom-V

If there is no update please close this issue @Razaghallu786

Mar 18 '25 02:03 rubiagatra

Here are a few ideas I've been exploring to tackle the repetitive token generation issue:

Tuning Generation Parameters: You might try adjusting the generation parameters—specifically, tweaking the temperature, top_p, and top_k values. This can sometimes help reduce unwanted repetition.

Simplify Prompts / Advanced Prompting: Often, the problem occurs when the model struggles with complex prompts. Simplifying the prompt can make a difference. Alternatively, you can experiment with advanced prompting methods like chain-of-thought (CoT) or even few-shot prompting to guide the model more effectively.

Model Variant Upgrade: If you're using a Gemini model, consider testing a higher variant (for example, switching from Gemini-1.5-flash to Gemini-2.0 Flash) to see if that improves the output.

Fine-Tuning Considerations: If you're using a fine-tuned model, try adjusting how you're calling the model to see if that impacts the repetition. Also, double-check that the dataset used for fine-tuning is suitable and that the fine-tuning process didn't introduce issues.

Framework Issues: If the model is being accessed with a framework, try running it standalone to determine if the problem might be related to the framework's implementation.

If you still encounter issues after trying these approaches, please share more details so I can investigate further.

Mar 31 '25 13:03 SaurabMishra12