Gym
Gym copied to clipboard
Docs + Environment pattern: Multi-Turn Training Pattern
Background
Users want to train models on multi-turn conversational tasks where the agent handles back-and-forth interactions with a user
Problem
Users need guidance on:
- What "multi-turn" means
- How to structure multi-turn tasks
- Different approaches to simulating user responses
- How to maintain conversation state and handle termination
- How to verify success in multi-turn tasks
Acceptance Criteria
- [ ] Examples and docs that implement the above
Priority
High - common request for agentic training
Related
- Links to "Using LLM to Simulate User Responses" tutorial