vllm [Frontend] Refactor prompt parsing

I have refactored the prompt parsing logic by moving it to the base class OpenAIServing, so that it can be shared between the Chat Completions, Completions and Embeddings APIs. The _validate_prompt_and_tokenize method has also been decomposed so that prompt and prompt_ids are processed separately.

Related Contributions

This PR is part of #3978.

Apr 12 '24 06:04 DarkLight1337

@njhill I see that you are working on #3512. I suppose it would be better to get that merged before continuing with this refactor.

Apr 12 '24 15:04 DarkLight1337

Seems that #4032 fixed the LoRA bugs, however entrypoints-test is still failing.

Apr 13 '24 04:04 DarkLight1337

Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now.

Apr 13 '24 04:04 DarkLight1337

@njhill The tests now pass. To move on with #3978, perhaps we can merge this PR first?

Apr 13 '24 05:04 DarkLight1337

Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now.

I'm updating test_batch_completions to better catch this error. Previously, the error only occurred in test_echo_logprob_completion which misled me from the real cause.

Apr 13 '24 05:04 DarkLight1337

@DarkLight1337 sorry for the hold-up, I will hopefully get to this tomorrow including looking at reconciling with #3512.

Apr 26 '24 00:04 njhill

No worries. Do note that I have ported some of the changes (mostly related to type annotations, which was the original intent of this PR) to #4355, so perhaps we can wait until that has been merged first. The remaining diffs associated with this PR should then be focused on the refactoring, making it easier to review.

Apr 26 '24 02:04 DarkLight1337

Alright, #4355 has been merged so I have updated this branch accordingly.

Apr 27 '24 05:04 DarkLight1337

Heads-up: I have copied OpenAIServing._parse_prompt_input_or_inputs to vllm.inputs.parse_and_batch_prompt in #4328. If that other PR is merged first, I'll make the change in this one accordingly.

May 03 '24 06:05 DarkLight1337

@njhill any update on this? As mentioned in #4194, we would like to work on #4200 in the near future so it would be great if we could get this merged before then.

May 23 '24 08:05 DarkLight1337

@DarkLight1337 sorry, going to look at this next, I did a review of #4328 at least :)

May 25 '24 00:05 njhill

I have merged #4328 into this PR in advance, so you might see additional diffs. These diffs should disappear once #4328 is merged into main.

May 25 '24 15:05 DarkLight1337

@njhill any update?

Jun 15 '24 04:06 DarkLight1337

@njhill I have updated this PR based on your #6227. PTAL.

Jul 18 '24 11:07 DarkLight1337

Looks like this line also needs to be removed from the tokenization test.

Jul 19 '24 01:07 njhill

Looks like this line also needs to be removed from the tokenization test.

Actually, let me add add_special_tokens to Completions API (defaulting to True to maintain the existing behaviour) as well to make things consistent. It's a vLLM feature originally introduced by #5278.

Jul 19 '24 01:07 DarkLight1337

My only concern is whether some folks might make use of the logging in the engine with different front ends.

I've moved out the logging to a separate class vllm.entrypoints.logger.RequestLogger, so it should be easy for them to add code to log the requests if need be.

Jul 19 '24 02:07 DarkLight1337

I have finished addressing your comments.

Jul 20 '24 00:07 DarkLight1337

vllm vllm copied to clipboard

[Frontend] Refactor prompt parsing

Related Contributions

vllm
vllm copied to clipboard