sglang icon indicating copy to clipboard operation
sglang copied to clipboard

forward_decode_batch VS forward_fill_batch?

Open aliencaocao opened this issue 11 months ago • 0 comments

When I am generating answers on LLAVA, I am using regex to specify a template, and the last token is always either 'Yes' or 'No.

One weird thing I have met is, I tried to look at the token being generated on each call of forward_fill_batch and forward_decode_batch . When I ask the model for YAML, only forward_fill_batch have appearance of the 'Yestoken that I need, and it is being called directly before the EOF token so I'm quite sure it is part of the output. While if I ask for JSON, the yes/no token only appears inforward_decode_batch`, again right before the EOF token.

By my understanding, forward_fill_batch is for filling the kvcache with system and user prompt, while forward_decode_batch is what the model actually generates. However this observation seem to suggest that both are somehow mixed up.

For context, I added token_ids = torch.argmax(logits, dim=-1) and then print(token_ids) on https://github.com/sgl-project/sglang/blob/30d67b2bca647d7a52fddc42a6d48842610cfec3/python/sglang/srt/managers/router/model_rpc.py#L423 and https://github.com/sgl-project/sglang/blob/30d67b2bca647d7a52fddc42a6d48842610cfec3/python/sglang/srt/managers/router/model_rpc.py#L506

Can anyone help to explain this behaviour?

The ultimate reason I'm doing this is to get the logits for the yes/no token to act as a classifier probability. I cannot find out how to get it to return the logprob without using choices and I want it to reason itself so i cannot use choices. This workaround works well IF i am able to find out why is it appearing in both functions. So far I just added a patch to capture the required logit in both function and refresh that value as needed.

aliencaocao avatar Mar 09 '24 14:03 aliencaocao