Docs + Environment pattern: How to Incorporate LLM as a Judge in Verification

Open cwing-nvidia opened this issue 1 month ago • 0 comments

Tutorial: How to Incorporate LLM as a Judge in Verification Logic

Background

Users have asked how to use LLM-as-a-judge for verification in their resource servers. This is particularly important for tasks where ground truth is difficult to verify (e.g., creative writing, instruction following).

Problem

Users need guidance on:

When to use LLM-as-a-judge vs. other verification methods
How to deploy the judge model
How to configure the judge model endpoint

Acceptance Criteria

[ ] Verification design - when to use LLM as a judge
[ ] Configuration - how to set up the judge model in NeMo Gym
[ ] Architecture - where the judge model runs
[ ] Deployment options - co-located on the same cluster, separate endpoint/cluster, using external APIs
[ ] point to existing resource servers that utilize LLM as a judge

Priority

High - common pattern for verification

This pattern is used internally but not documented for external users
Consider showing integration with multiple LLM providers (OpenAI, Gemini, Claude, local models)

Nov 13 '25 05:11 cwing-nvidia

Docs + Environment pattern: How to Incorporate LLM as a Judge in Verification

Tutorial: How to Incorporate LLM as a Judge in Verification Logic

Background

Problem

Acceptance Criteria

Priority

Related