Gym icon indicating copy to clipboard operation
Gym copied to clipboard

Docs + Environment pattern: Multi-Turn Training Pattern

Open cwing-nvidia opened this issue 1 month ago • 0 comments

Background

Users want to train models on multi-turn conversational tasks where the agent handles back-and-forth interactions with a user

Problem

Users need guidance on:

  • What "multi-turn" means
  • How to structure multi-turn tasks
  • Different approaches to simulating user responses
  • How to maintain conversation state and handle termination
  • How to verify success in multi-turn tasks

Acceptance Criteria

  • [ ] Examples and docs that implement the above

Priority

High - common request for agentic training

Related

  • Links to "Using LLM to Simulate User Responses" tutorial

cwing-nvidia avatar Nov 13 '25 06:11 cwing-nvidia