Gym

Gym copied to clipboard

Published 1 week ago •

Reame
Issues

Docs + Environment pattern: Multi-Turn Training Pattern

Open cwing-nvidia opened this issue 1 month ago • 0 comments

Background

Users want to train models on multi-turn conversational tasks where the agent handles back-and-forth interactions with a user

Problem

Users need guidance on:

What "multi-turn" means
How to structure multi-turn tasks
Different approaches to simulating user responses
How to maintain conversation state and handle termination
How to verify success in multi-turn tasks

Acceptance Criteria

[ ] Examples and docs that implement the above

Priority

High - common request for agentic training

Related

Links to "Using LLM to Simulate User Responses" tutorial

Nov 13 '25 06:11 cwing-nvidia