Gym
Gym copied to clipboard
Docs + Environment pattern: How to Incorporate LLM as a Judge in Verification
Tutorial: How to Incorporate LLM as a Judge in Verification Logic
Background
Users have asked how to use LLM-as-a-judge for verification in their resource servers. This is particularly important for tasks where ground truth is difficult to verify (e.g., creative writing, instruction following).
Problem
Users need guidance on:
- When to use LLM-as-a-judge vs. other verification methods
- How to deploy the judge model
- How to configure the judge model endpoint
Acceptance Criteria
- [ ] Verification design - when to use LLM as a judge
- [ ] Configuration - how to set up the judge model in NeMo Gym
- [ ] Architecture - where the judge model runs
- [ ] Deployment options - co-located on the same cluster, separate endpoint/cluster, using external APIs
- [ ] point to existing resource servers that utilize LLM as a judge
Priority
High - common pattern for verification
Related
- This pattern is used internally but not documented for external users
- Consider showing integration with multiple LLM providers (OpenAI, Gemini, Claude, local models)