vllm
vllm copied to clipboard
[Frontend] Refactor prompt parsing
I have refactored the prompt parsing logic by moving it to the base class OpenAIServing, so that it can be shared between the Chat Completions, Completions and Embeddings APIs. The _validate_prompt_and_tokenize method has also been decomposed so that prompt and prompt_ids are processed separately.
Related Contributions
This PR is part of #3978.
@njhill I see that you are working on #3512. I suppose it would be better to get that merged before continuing with this refactor.
Seems that #4032 fixed the LoRA bugs, however entrypoints-test is still failing.
Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now.
@njhill The tests now pass. To move on with #3978, perhaps we can merge this PR first?
Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now.
I'm updating test_batch_completions to better catch this error. Previously, the error only occurred in test_echo_logprob_completion which misled me from the real cause.
@DarkLight1337 sorry for the hold-up, I will hopefully get to this tomorrow including looking at reconciling with #3512.
No worries. Do note that I have ported some of the changes (mostly related to type annotations, which was the original intent of this PR) to #4355, so perhaps we can wait until that has been merged first. The remaining diffs associated with this PR should then be focused on the refactoring, making it easier to review.
Alright, #4355 has been merged so I have updated this branch accordingly.
Heads-up: I have copied OpenAIServing._parse_prompt_input_or_inputs to vllm.inputs.parse_and_batch_prompt in #4328. If that other PR is merged first, I'll make the change in this one accordingly.
@njhill any update on this? As mentioned in #4194, we would like to work on #4200 in the near future so it would be great if we could get this merged before then.
@DarkLight1337 sorry, going to look at this next, I did a review of #4328 at least :)
I have merged #4328 into this PR in advance, so you might see additional diffs. These diffs should disappear once #4328 is merged into main.
@njhill any update?
@njhill I have updated this PR based on your #6227. PTAL.
Looks like this line also needs to be removed from the tokenization test.
Looks like this line also needs to be removed from the tokenization test.
Actually, let me add add_special_tokens to Completions API (defaulting to True to maintain the existing behaviour) as well to make things consistent. It's a vLLM feature originally introduced by #5278.
My only concern is whether some folks might make use of the logging in the engine with different front ends.
I've moved out the logging to a separate class vllm.entrypoints.logger.RequestLogger, so it should be easy for them to add code to log the requests if need be.
I have finished addressing your comments.