[Docs] Add/improve error code documentation of thrown `GenkitError`s + retry handling/strategy

Open Tyg-g opened this issue 6 months ago • 0 comments

Is your report related to a problem? Please describe.

I'm trying to determine, which errors should be automatically retried.

My processing flow is quite complicated so

✅ I want to retry everything locally if possible, so any previous steps are not rerun, when not needed, however,
❌ For errors, for which retry doesn't make sense (validation error, unauthorized, invalid parameter), I want to skip retries — otherwise it could rerun unnecessary previous steps, and would waste resources.

Is your report related to a suggestion/improvement? Please describe.

💡 Yes — I'd like to kindly ask the devs to improve the error documentation, with clear and structured explanations. I think this is quite a common scenario when implementing more complex workflows.

I'll break down my suggestion for the 2 major error types involved: GenerationResponseError and GenerationResponseError.

Additional context

⚠️ GenerationResponseError

Please explain the status codes in detail.

Some of them seem straightforward:

"UNAUTHENTICATED" - probably 401 Unauthorized
"PERMISSION_DENIED" - probably 403 Forbidden
"RESOURCE_EXHAUSTED" - probably 429 Too Many Requests

🔁 Are these 1:1 mappings to HTTP status codes? Or are these statuses used in other cases?

Other codes are more ambiguous:

"OUT_OF_RANGE" - is it for
- 400 Bad Request (for an incorrect parameter), or
- 416 Range Not Satisfiable (bad offser when downloading a resource), or
- is it for some other specific case?
"INTERNAL" - is this for
- 500 Internal Server Error on the LLM backend,
- or is it a local/internal failure in the genkit module?

💭 Overall: More clarity would help with programmatic error handling and retry logic.

🧪 ValidationError

It would be important to distinguish if a validation failure happened in an input schema or an output schema (both for prompts and flows):

🟩 Output validation error (e.g. malformed LLM response) ➤ This is likely a random erroneous output from the LLM → should be retried.
🟥 Input validation error (e.g. invalid prompt or parameters) ➤ This usually signals a bug → should not be retried, but should crash early.

🧵 Final Thought

Yes, I know that the error objects contain human-readable details and messages. But that doesn't help with programmatic decision-making — especially because the more specific info (detail) has no specified structure.

📌 I think structured, documented guidance on what different errors mean and how to handle them (retry, abort, etc.) would massively improve DX and stability.

Thanks for the help. 🙏

Jun 04 '25 19:06 Tyg-g