vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Frontend] Refactor prompt parsing

Open DarkLight1337 opened this issue 1 year ago • 9 comments

I have refactored the prompt parsing logic by moving it to the base class OpenAIServing, so that it can be shared between the Chat Completions, Completions and Embeddings APIs. The _validate_prompt_and_tokenize method has also been decomposed so that prompt and prompt_ids are processed separately.

Related Contributions

This PR is part of #3978.

DarkLight1337 avatar Apr 12 '24 06:04 DarkLight1337

@njhill I see that you are working on #3512. I suppose it would be better to get that merged before continuing with this refactor.

DarkLight1337 avatar Apr 12 '24 15:04 DarkLight1337

Seems that #4032 fixed the LoRA bugs, however entrypoints-test is still failing.

DarkLight1337 avatar Apr 13 '24 04:04 DarkLight1337

Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now.

DarkLight1337 avatar Apr 13 '24 04:04 DarkLight1337

@njhill The tests now pass. To move on with #3978, perhaps we can merge this PR first?

DarkLight1337 avatar Apr 13 '24 05:04 DarkLight1337

Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now.

I'm updating test_batch_completions to better catch this error. Previously, the error only occurred in test_echo_logprob_completion which misled me from the real cause.

DarkLight1337 avatar Apr 13 '24 05:04 DarkLight1337

@DarkLight1337 sorry for the hold-up, I will hopefully get to this tomorrow including looking at reconciling with #3512.

njhill avatar Apr 26 '24 00:04 njhill

No worries. Do note that I have ported some of the changes (mostly related to type annotations, which was the original intent of this PR) to #4355, so perhaps we can wait until that has been merged first. The remaining diffs associated with this PR should then be focused on the refactoring, making it easier to review.

DarkLight1337 avatar Apr 26 '24 02:04 DarkLight1337

Alright, #4355 has been merged so I have updated this branch accordingly.

DarkLight1337 avatar Apr 27 '24 05:04 DarkLight1337

Heads-up: I have copied OpenAIServing._parse_prompt_input_or_inputs to vllm.inputs.parse_and_batch_prompt in #4328. If that other PR is merged first, I'll make the change in this one accordingly.

DarkLight1337 avatar May 03 '24 06:05 DarkLight1337

@njhill any update on this? As mentioned in #4194, we would like to work on #4200 in the near future so it would be great if we could get this merged before then.

DarkLight1337 avatar May 23 '24 08:05 DarkLight1337

@DarkLight1337 sorry, going to look at this next, I did a review of #4328 at least :)

njhill avatar May 25 '24 00:05 njhill

I have merged #4328 into this PR in advance, so you might see additional diffs. These diffs should disappear once #4328 is merged into main.

DarkLight1337 avatar May 25 '24 15:05 DarkLight1337

@njhill any update?

DarkLight1337 avatar Jun 15 '24 04:06 DarkLight1337

@njhill I have updated this PR based on your #6227. PTAL.

DarkLight1337 avatar Jul 18 '24 11:07 DarkLight1337

Looks like this line also needs to be removed from the tokenization test.

njhill avatar Jul 19 '24 01:07 njhill

Looks like this line also needs to be removed from the tokenization test.

Actually, let me add add_special_tokens to Completions API (defaulting to True to maintain the existing behaviour) as well to make things consistent. It's a vLLM feature originally introduced by #5278.

DarkLight1337 avatar Jul 19 '24 01:07 DarkLight1337

My only concern is whether some folks might make use of the logging in the engine with different front ends.

I've moved out the logging to a separate class vllm.entrypoints.logger.RequestLogger, so it should be easy for them to add code to log the requests if need be.

DarkLight1337 avatar Jul 19 '24 02:07 DarkLight1337

I have finished addressing your comments.

DarkLight1337 avatar Jul 20 '24 00:07 DarkLight1337