vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server

Open njhill opened this issue 11 months ago • 2 comments

The front-end server code currently doesn't use lora-specific tokenizers.

It also won't make use of the recently introduced parallel async tokenization if enabled.

njhill avatar Mar 19 '24 21:03 njhill

Test failures look unrelated (network blips).

njhill avatar Mar 20 '24 00:03 njhill

Can the same tokenizer be used to apply the chat template as well?

DarkLight1337 avatar May 03 '24 08:05 DarkLight1337