sglang
sglang copied to clipboard
forward_decode_batch VS forward_fill_batch?
When I am generating answers on LLAVA, I am using regex to specify a template, and the last token is always either 'Yes' or 'No.
One weird thing I have met is, I tried to look at the token being generated on each call of forward_fill_batch
and forward_decode_batch
. When I ask the model for YAML, only forward_fill_batch
have appearance of the 'Yestoken that I need, and it is being called directly before the EOF token so I'm quite sure it is part of the output. While if I ask for JSON, the yes/no token only appears in
forward_decode_batch`, again right before the EOF token.
By my understanding, forward_fill_batch
is for filling the kvcache with system and user prompt, while forward_decode_batch
is what the model actually generates. However this observation seem to suggest that both are somehow mixed up.
For context, I added token_ids = torch.argmax(logits, dim=-1)
and then print(token_ids)
on https://github.com/sgl-project/sglang/blob/30d67b2bca647d7a52fddc42a6d48842610cfec3/python/sglang/srt/managers/router/model_rpc.py#L423 and https://github.com/sgl-project/sglang/blob/30d67b2bca647d7a52fddc42a6d48842610cfec3/python/sglang/srt/managers/router/model_rpc.py#L506
Can anyone help to explain this behaviour?
The ultimate reason I'm doing this is to get the logits for the yes/no token to act as a classifier probability. I cannot find out how to get it to return the logprob without using choices
and I want it to reason itself so i cannot use choices
. This workaround works well IF i am able to find out why is it appearing in both functions. So far I just added a patch to capture the required logit in both function and refresh that value as needed.