Yuge Zhang
Yuge Zhang
@SiyunZhao do you have time for the migration?
Please resolve the conflicts
@acured Please avoid force push. This will toss away the diff and make review difficult. When merging, we will squash. So the chaotic commit history on the feature branch does...
> If the current tracer cannot detect such modifications, what is the officially recommended solution—is it to trigger a dummy LLM call, register an MCP tool, or something else? You...
You can use emit_reward to generate intermediate reward signals. However, current verl algorithm only supports identical credit assignment. For that part of customization, please refer to #31.
Looks like a host memory boom. Would you try scripts/restart_ray.sh to restart ray?
We can't train two models at the same time due to a verl limitation. We can however train two agents alternatively by specifying the `trained_agents` parameter in LitAgent.