verl
verl copied to clipboard
multiturn_eval
Summary
- Purpose: reuse the training AgentLoop rollout for single-run multi-turn eval without PPO, decoupling eval from training.
- How to run: vLLM async GSM8K example script with key Hydra knobs, outputs, and checkpoint handling.
- Checkpoints: supports loading FSDP/FSDP2 training checkpoints via built-in DeviceMesh/process-group compatibility patches.
- Validated scenarios: GSM8K, Geo3K, and multimodal; async + vLLM with TP1/TP2 all pass on FSDP checkpoints.
- Extensibility: add custom metrics via aggregate_summary / collect_sample_records or AgentLoop agent_metrics.
- Note: sglang TP compatibility is still missing (tp=1 fails, multi-TP untested); to be fixed later.
- No source code changes—documentation-only addition.
Reviewer Notes
- Please skim the new doc for correctness of run instructions and limitations.