verl
verl copied to clipboard
Support evaluation mode for multi-turn tool-calling using trained model (post-training multi-round tool invocation)
I have trained a model using VeRL and now wish to perform evaluation/inference (not training) in a scenario with multi-turn tool calling (i.e., the model interacts over multiple user/assistant turns, uses tools during the interaction).
However, it appears that the existing scripts (main_generation, main_eval, etc) in VeRL do not support this use-case directly:
- They seem oriented towards single-turn generation or training time multi-turn (via Agent Loop)
- I was unable to locate a documented workflow or ready-to-use script for post-training inference/evaluation with multi-turn + tool calls
- When I attempted to set
rollout.multi_turn.enable=trueunderactor_rollout_ref.rollout, the tools did not execute (or the conversation did not loop for multiple turns)
Questions
- Is there an existing supported script or command in VeRL for multi-turn tool-calling evaluation (i.e., using a trained model, doing inference across multiple assistant/user turns + tool calls)?
- If not, is this considered a planned feature? What would the recommended path be to implement this?
- Are there known limitations (e.g., only supported in training mode, only certain backends, only synchronous mode) that prevent evaluation mode multi-turn tool-calling?