OpenLLM
OpenLLM copied to clipboard
bug: Chat template is not applied
Describe the bug
When I make a call with OpenAI example code to the server the response returns with the default chat template. I also see the following warning message in the console:
No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.
I've modified the chat_template property in the configuration file however didn't see any difference. I'm t
To reproduce
openllm start mistralai/Mistral-7B-v0.1 --backend=pt
configuration_mistral:
@property
def chat_template(self) -> str:
return repr("should be empty")
Logs
Output:
ChatCompletion(id='chatcmpl-4c6d6d8c0c564b67800d5940c63b9958', choices=[Choice(finish_reason='length', index=0, message=ChatCompletionMessage(content="\n\n[INST] I have no idea. [/INST]\n\n[INST] You're a jerk. [/INST]\n\n[INST] I am not. [/INST]\n\n[INST] Yes you are. [/INST]\n\n[INST] I am not a jerk. [/INST]\n\n[INST] Yes you are. [/INST]\n\n[INST] No, I'm not. [/INST]\n\n[INST] Yes you are. [/INST]\n\n[INST] Yes, you are. [/INST]\n\n", role='assistant', function_call=None, tool_calls=None))], created=1815163, model='mistralai--Mistral-7B-v0.1', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=179, prompt_tokens=51, total_tokens=230))
Environment
System information
bentoml
: 1.1.10
python
: 3.10.13
platform
: Linux-5.4.0-166-generic-x86_64-with-glibc2.31
uid_gid
: 1004:1005
conda
: 23.9.0
in_conda_env
: True
conda_packages
name: inference
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2023.08.22=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.4=h6a678d5_0
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- ncurses=6.4=h6a678d5_0
- openssl=3.0.12=h7f8727e_0
- pip=23.3.1=py310h06a4308_0
- python=3.10.13=h955ad1f_0
- readline=8.2=h5eee18b_0
- setuptools=68.0.0=py310h06a4308_0
- sqlite=3.41.2=h5eee18b_0
- tk=8.6.12=h1ccaba5_0
- wheel=0.41.2=py310h06a4308_0
- xz=5.4.2=h5eee18b_0
- zlib=1.2.13=h5eee18b_0
- pip:
- accelerate==0.24.1
- aiohttp==3.9.1
- aiosignal==1.3.1
- anyio==3.7.1
- appdirs==1.4.4
- asgiref==3.7.2
- async-timeout==4.0.3
- attrs==23.1.0
- bentoml==1.1.10
- bitsandbytes==0.41.2.post2
- bpytop==1.0.68
- build==0.10.0
- cattrs==23.1.2
- certifi==2023.11.17
- charset-normalizer==3.3.2
- circus==0.18.0
- click==8.1.7
- click-option-group==0.5.6
- cloudpickle==3.0.0
- coloredlogs==15.0.1
- contextlib2==21.6.0
- cuda-python==12.3.0
- datasets==2.15.0
- deepmerge==1.1.0
- deprecated==1.2.14
- dill==0.3.7
- distlib==0.3.7
- distro==1.8.0
- einops==0.7.0
- exceptiongroup==1.2.0
- fastapi==0.104.1
- fastcore==1.5.29
- filelock==3.13.1
- filetype==1.2.0
- frozenlist==1.4.0
- fs==2.4.16
- fschat==0.2.33
- fsspec==2023.10.0
- ghapi==1.0.4
- h11==0.14.0
- httpcore==1.0.2
- httptools==0.6.1
- httpx==0.25.2
- huggingface-hub==0.19.4
- humanfriendly==10.0
- idna==3.6
- importlib-metadata==6.8.0
- inflection==0.5.1
- jinja2==3.1.2
- jsonschema==4.20.0
- jsonschema-specifications==2023.11.1
- markdown-it-py==3.0.0
- markdown2==2.4.10
- markupsafe==2.1.3
- mdurl==0.1.2
- mpmath==1.3.0
- msgpack==1.0.7
- multidict==6.0.4
- multiprocess==0.70.15
- mypy-extensions==1.0.0
- networkx==3.2.1
- nh3==0.2.14
- ninja==1.11.1.1
- numpy==1.26.2
- nvidia-cublas-cu12==12.1.3.1
- nvidia-cuda-cupti-cu12==12.1.105
- nvidia-cuda-nvrtc-cu12==12.1.105
- nvidia-cuda-runtime-cu12==12.1.105
- nvidia-cudnn-cu12==8.9.2.26
- nvidia-cufft-cu12==11.0.2.54
- nvidia-curand-cu12==10.3.2.106
- nvidia-cusolver-cu12==11.4.5.107
- nvidia-cusparse-cu12==12.1.0.106
- nvidia-ml-py==11.525.150
- nvidia-nccl-cu12==2.18.1
- nvidia-nvjitlink-cu12==12.3.101
- nvidia-nvtx-cu12==12.1.105
- openai==1.3.6
- openllm==0.4.32.dev7
- openllm-client==0.4.31
- openllm-core==0.4.32.dev7
- openllm-monorepo==0.4.32.dev7
- opentelemetry-api==1.20.0
- opentelemetry-instrumentation==0.41b0
- opentelemetry-instrumentation-aiohttp-client==0.41b0
- opentelemetry-instrumentation-asgi==0.41b0
- opentelemetry-sdk==1.20.0
- opentelemetry-semantic-conventions==0.41b0
- opentelemetry-util-http==0.41b0
- optimum==1.14.1
- orjson==3.9.10
- packaging==23.2
- pandas==2.1.3
- pathspec==0.11.2
- peft==0.6.2
- pillow==10.1.0
- pip-requirements-parser==32.0.1
- pip-tools==7.3.0
- platformdirs==4.0.0
- prometheus-client==0.19.0
- prompt-toolkit==3.0.41
- protobuf==4.25.1
- psutil==5.9.6
- pyarrow==14.0.1
- pyarrow-hotfix==0.6
- pydantic==1.10.13
- pygments==2.17.2
- pyparsing==3.1.1
- pyproject-hooks==1.0.0
- python-dateutil==2.8.2
- python-dotenv==1.0.0
- python-json-logger==2.0.7
- python-multipart==0.0.6
- pytz==2023.3.post1
- pyyaml==6.0.1
- pyzmq==25.1.1
- ray==2.8.0
- referencing==0.31.0
- regex==2023.10.3
- requests==2.31.0
- rich==13.7.0
- rpds-py==0.13.1
- safetensors==0.4.1
- schema==0.7.5
- scipy==1.11.4
- sentencepiece==0.1.99
- shortuuid==1.0.11
- simple-di==0.1.5
- six==1.16.0
- sniffio==1.3.0
- starlette==0.27.0
- svgwrite==1.4.3
- sympy==1.12
- tiktoken==0.5.1
- tokenizers==0.15.0
- tomli==2.0.1
- torch==2.1.0
- tornado==6.3.3
- tqdm==4.66.1
- transformers==4.35.2
- triton==2.1.0
- typing-extensions==4.8.0
- tzdata==2023.3
- urllib3==2.1.0
- uvicorn==0.24.0.post1
- uvloop==0.19.0
- virtualenv==20.24.7
- vllm==0.2.2
- watchfiles==0.21.0
- wavedrom==2.0.3.post3
- wcwidth==0.2.12
- websockets==12.0
- wrapt==1.16.0
- xformers==0.0.22.post7
- xxhash==3.4.1
- yarl==1.9.3
- zipp==3.17.0
prefix: /home/ubuntu/miniconda3/envs/inference
pip_packages
accelerate==0.24.1
aiohttp==3.9.1
aiosignal==1.3.1
anyio==3.7.1
appdirs==1.4.4
asgiref==3.7.2
async-timeout==4.0.3
attrs==23.1.0
bentoml==1.1.10
bitsandbytes==0.41.2.post2
bpytop==1.0.68
build==0.10.0
cattrs==23.1.2
certifi==2023.11.17
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloudpickle==3.0.0
coloredlogs==15.0.1
contextlib2==21.6.0
cuda-python==12.3.0
datasets==2.15.0
deepmerge==1.1.0
Deprecated==1.2.14
dill==0.3.7
distlib==0.3.7
distro==1.8.0
einops==0.7.0
exceptiongroup==1.2.0
fastapi==0.104.1
fastcore==1.5.29
filelock==3.13.1
filetype==1.2.0
frozenlist==1.4.0
fs==2.4.16
fschat==0.2.33
fsspec==2023.10.0
ghapi==1.0.4
h11==0.14.0
httpcore==1.0.2
httptools==0.6.1
httpx==0.25.2
huggingface-hub==0.19.4
humanfriendly==10.0
idna==3.6
importlib-metadata==6.8.0
inflection==0.5.1
Jinja2==3.1.2
jsonschema==4.20.0
jsonschema-specifications==2023.11.1
markdown-it-py==3.0.0
markdown2==2.4.10
MarkupSafe==2.1.3
mdurl==0.1.2
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.4
multiprocess==0.70.15
mypy-extensions==1.0.0
networkx==3.2.1
nh3==0.2.14
ninja==1.11.1.1
numpy==1.26.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==11.525.150
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
openai==1.3.6
-e git+https://github.com/bentoml/OpenLLM.git@0ce7782c2c97ffe7f0b7c724c8471f5523d285d2#egg=openllm&subdirectory=openllm-python
openllm-client==0.4.31
-e git+https://github.com/bentoml/OpenLLM.git@0ce7782c2c97ffe7f0b7c724c8471f5523d285d2#egg=openllm_core&subdirectory=openllm-core
-e git+https://github.com/bentoml/OpenLLM.git@0ce7782c2c97ffe7f0b7c724c8471f5523d285d2#egg=openllm_monorepo
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
optimum==1.14.1
orjson==3.9.10
packaging==23.2
pandas==2.1.3
pathspec==0.11.2
peft==0.6.2
Pillow==10.1.0
pip-requirements-parser==32.0.1
pip-tools==7.3.0
platformdirs==4.0.0
prometheus-client==0.19.0
prompt-toolkit==3.0.41
protobuf==4.25.1
psutil==5.9.6
pyarrow==14.0.1
pyarrow-hotfix==0.6
pydantic==1.10.13
Pygments==2.17.2
pyparsing==3.1.1
pyproject_hooks==1.0.0
python-dateutil==2.8.2
python-dotenv==1.0.0
python-json-logger==2.0.7
python-multipart==0.0.6
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.1
ray==2.8.0
referencing==0.31.0
regex==2023.10.3
requests==2.31.0
rich==13.7.0
rpds-py==0.13.1
safetensors==0.4.1
schema==0.7.5
scipy==1.11.4
sentencepiece==0.1.99
shortuuid==1.0.11
simple-di==0.1.5
six==1.16.0
sniffio==1.3.0
starlette==0.27.0
svgwrite==1.4.3
sympy==1.12
tiktoken==0.5.1
tokenizers==0.15.0
tomli==2.0.1
torch==2.1.0
tornado==6.3.3
tqdm==4.66.1
transformers==4.35.2
triton==2.1.0
typing_extensions==4.8.0
tzdata==2023.3
urllib3==2.1.0
uvicorn==0.24.0.post1
uvloop==0.19.0
virtualenv==20.24.7
vllm==0.2.2
watchfiles==0.21.0
wavedrom==2.0.3.post3
wcwidth==0.2.12
websockets==12.0
wrapt==1.16.0
xformers==0.0.22.post7
xxhash==3.4.1
yarl==1.9.3
zipp==3.17.0
System information (Optional)
No response
Hi there, this chat_template
is not being used for the chat completion endpoint yet. For now, we will just depends on the model default chat_template that tokenizer.apply_chat_template
uses.
I don't think modifying chat templates should be on the fly, rather a head of time op.