verl icon indicating copy to clipboard operation
verl copied to clipboard

multiturn_eval

Open albertimff opened this issue 1 month ago • 0 comments

Summary

  • Purpose: reuse the training AgentLoop rollout for single-run multi-turn eval without PPO, decoupling eval from training.
  • How to run: vLLM async GSM8K example script with key Hydra knobs, outputs, and checkpoint handling.
  • Checkpoints: supports loading FSDP/FSDP2 training checkpoints via built-in DeviceMesh/process-group compatibility patches.
  • Validated scenarios: GSM8K, Geo3K, and multimodal; async + vLLM with TP1/TP2 all pass on FSDP checkpoints.
  • Extensibility: add custom metrics via aggregate_summary / collect_sample_records or AgentLoop agent_metrics.
  • Note: sglang TP compatibility is still missing (tp=1 fails, multi-TP untested); to be fixed later.
  • No source code changes—documentation-only addition.

Reviewer Notes

  • Please skim the new doc for correctness of run instructions and limitations.

albertimff avatar Dec 01 '25 05:12 albertimff