vllm
vllm copied to clipboard
[Usage]: jinja chat template for starcoder2
Your current environment
Hi all, I want to use starcoder2 using vllm run on a Docker container. here is my config:
--model neuralmagic/starcoder2-7b-quantized.w8a8 \
--disable-log-requests \
--use-v2-block-manager \
--max_num_batched_tokens 32000 \
--block-size 32 \
--max-num-seqs 600"
when I start the Docker container, I get this error, but the program continues to run (although it produces nonsense):
ERROR 02-15 09:45:26 serving_chat.py:181] Error in preprocessing prompt inputs
ERROR 02-15 09:45:26 serving_chat.py:181] Traceback (most recent call last):
ERROR 02-15 09:45:26 serving_chat.py:181] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 165, in create_chat_completion
ERROR 02-15 09:45:26 serving_chat.py:181] ) = await self._preprocess_chat(
ERROR 02-15 09:45:26 serving_chat.py:181] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-15 09:45:26 serving_chat.py:181] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 479, in _preprocess_chat
ERROR 02-15 09:45:26 serving_chat.py:181] request_prompt = apply_hf_chat_template(
ERROR 02-15 09:45:26 serving_chat.py:181] ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-15 09:45:26 serving_chat.py:181] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 983, in apply_hf_chat_template
ERROR 02-15 09:45:26 serving_chat.py:181] raise ValueError(
ERROR 02-15 09:45:26 serving_chat.py:181] ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.
after searching, I found out that I need to define a chat template for the LLM. Can somebody help me to do so?
How would you like to use vllm
I want to run inference of starcoder2 (link). I don't know how to define a chat-template for it.
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.