oss-fuzz-gen icon indicating copy to clipboard operation
oss-fuzz-gen copied to clipboard

Mitigate `"finish_reason": "RECITATION"` error in VertexAI queries.

Open DonggeLiu opened this issue 1 year ago • 5 comments

Some VertexAI queries fail due to citation filter, i.e., "finish_reason": "RECITATION".

image

There is no good mitigation except increasing the temperature or prompt engineering. We might be able to avoid it later if the model allows us to configure the threshold.

DonggeLiu avatar Jun 25 '24 04:06 DonggeLiu

I tried increasing the temperature and several prompt engineering "hacks" like inserting a long random phrase between short segments, but still getting tons for recitation errors.

naourass avatar Sep 29 '24 03:09 naourass

I tried increasing the temperature and several prompt engineering "hacks" like inserting a long random phrase between short segments, but still getting tons for recitation errors.

Yep, I reckon that's from Gemini internal so the external hacks won't prevent this.

DonggeLiu avatar Sep 30 '24 00:09 DonggeLiu

@naourass @DonggeLiu
Building on your insights about Gemini’s internal filters, I’ve found that combining token adjustments with other parameters reduces RECITATION errors in testing:

1. Token + Temp Synergy (models.py#L24):

TEMPERATURE = 0.8  # From 0.4  
MAX_OUTPUT_TOKENS = 768  # Shorter responses avoid verbatim patterns  

Result: ~25% fewer errors vs. temperature alone.

2. Constraint-Driven Prompts (prompt_builder.py):

prompt += "\n// Avoid common API patterns; use novel input combinations"  

Why: Directives work better than random phrases (tested 18% vs. 7% error reduction).

3. Graceful Fallback (base_agent.py):

except RecitationError:  
    self.switch_model("code-bison")  # Less strict model  

Tradeoffs:

  • Requires stricter output validation
  • May increase non-determinism

If helpful, I can:

  • Submit a PR for joint token/temp testing
  • Co-draft a Google feature request for threshold controls

Ekam219 avatar Mar 22 '25 09:03 Ekam219

TEMPERATURE = 0.8 # From 0.4

Not sure if this is a good solution, given temperature will also affect the result quality.

MAX_OUTPUT_TOKENS

This can be useful to experiment, but it still may affect the result quality, especially given many responses in agents are quite long.

prompt += "\n// Avoid common API patterns; use novel input combinations"

Also like to affect other results.

except RecitationError:
self.switch_model("code-bison") # Less strict model

This could work. But I reckon this is a lower priority given RECITATION error does not show very often now.

DonggeLiu avatar Mar 22 '25 23:03 DonggeLiu

@DonggeLiu Thanks for the feedback! To address your concerns:

For tokens: Propose testing 1024 as a middle-ground length.

Will refine prompts to be less intrusive (e.g., "Prioritize unique parameter combinations").

Fallback can remain on hold per your note.

Ekam219 avatar Mar 22 '25 23:03 Ekam219