vllm
vllm copied to clipboard
[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server
The front-end server code currently doesn't use lora-specific tokenizers.
It also won't make use of the recently introduced parallel async tokenization if enabled.
Test failures look unrelated (network blips).
Can the same tokenizer be used to apply the chat template as well?