[Bug] tracer returns incorrect token_ids
When using verl + agent-lightning for multi-round TIR training, we noticed a serious bug in the vLLM return_token_id implementation. When using streaming output + Tool Parser, some "control tokens" did not return the correct token_id, which would cause severe mismatching during training and could lead to model crashes within a few steps. We have submitted a PR to vLLM to fix this issue.
https://github.com/vllm-project/vllm/pull/29074
I hope we can work together to advance this bug fix and merge it into vLLM as soon as possible.
Agent-lightning currently has a workaround to strip the streaming mode into non-streaming mode via LiteLLM proxy: #293
The LiteLLM telemetry also has some severe bugs relating to streaming. So vLLM is not the only framework who needs bug fixes to make this work.