Gym icon indicating copy to clipboard operation
Gym copied to clipboard

Docs + Environment pattern: How to Incorporate LLM as a Judge in Verification

Open cwing-nvidia opened this issue 1 month ago • 0 comments

Tutorial: How to Incorporate LLM as a Judge in Verification Logic

Background

Users have asked how to use LLM-as-a-judge for verification in their resource servers. This is particularly important for tasks where ground truth is difficult to verify (e.g., creative writing, instruction following).

Problem

Users need guidance on:

  • When to use LLM-as-a-judge vs. other verification methods
  • How to deploy the judge model
  • How to configure the judge model endpoint

Acceptance Criteria

  • [ ] Verification design - when to use LLM as a judge
  • [ ] Configuration - how to set up the judge model in NeMo Gym
  • [ ] Architecture - where the judge model runs
  • [ ] Deployment options - co-located on the same cluster, separate endpoint/cluster, using external APIs
  • [ ] point to existing resource servers that utilize LLM as a judge

Priority

High - common pattern for verification

Related

  • This pattern is used internally but not documented for external users
  • Consider showing integration with multiple LLM providers (OpenAI, Gemini, Claude, local models)

cwing-nvidia avatar Nov 13 '25 05:11 cwing-nvidia