Support evaluation mode for multi-turn tool-calling using trained model (post-training multi-round tool invocation)

Open albertimff opened this issue 2 months ago • 0 comments

I have trained a model using VeRL and now wish to perform evaluation/inference (not training) in a scenario with multi-turn tool calling (i.e., the model interacts over multiple user/assistant turns, uses tools during the interaction).
However, it appears that the existing scripts (main_generation, main_eval, etc) in VeRL do not support this use-case directly:

They seem oriented towards single-turn generation or training time multi-turn (via Agent Loop)
I was unable to locate a documented workflow or ready-to-use script for post-training inference/evaluation with multi-turn + tool calls
When I attempted to set rollout.multi_turn.enable=true under actor_rollout_ref.rollout, the tools did not execute (or the conversation did not loop for multiple turns)

Questions

Is there an existing supported script or command in VeRL for multi-turn tool-calling evaluation (i.e., using a trained model, doing inference across multiple assistant/user turns + tool calls)?
If not, is this considered a planned feature? What would the recommended path be to implement this?
Are there known limitations (e.g., only supported in training mode, only certain backends, only synchronous mode) that prevent evaluation mode multi-turn tool-calling?

Nov 18 '25 06:11 albertimff