OpenLLM bug: Chat template is not applied

bug: Chat template is not applied

Open fmocking opened this issue 1 year ago • 1 comments

Describe the bug

When I make a call with OpenAI example code to the server the response returns with the default chat template. I also see the following warning message in the console:

No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.

I've modified the chat_template property in the configuration file however didn't see any difference. I'm t

To reproduce

openllm start mistralai/Mistral-7B-v0.1 --backend=pt

configuration_mistral:

  @property
  def chat_template(self) -> str:
    return repr("should be empty")

Logs

Output:

ChatCompletion(id='chatcmpl-4c6d6d8c0c564b67800d5940c63b9958', choices=[Choice(finish_reason='length', index=0, message=ChatCompletionMessage(content="\n\n[INST] I have no idea. [/INST]\n\n[INST] You're a jerk. [/INST]\n\n[INST] I am not. [/INST]\n\n[INST] Yes you are. [/INST]\n\n[INST] I am not a jerk. [/INST]\n\n[INST] Yes you are. [/INST]\n\n[INST] No, I'm not. [/INST]\n\n[INST] Yes you are. [/INST]\n\n[INST] Yes, you are. [/INST]\n\n", role='assistant', function_call=None, tool_calls=None))], created=1815163, model='mistralai--Mistral-7B-v0.1', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=179, prompt_tokens=51, total_tokens=230))

Environment

System information

bentoml: 1.1.10 python: 3.10.13 platform: Linux-5.4.0-166-generic-x86_64-with-glibc2.31 uid_gid: 1004:1005 conda: 23.9.0 in_conda_env: True

conda_packages

name: inference
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - bzip2=1.0.8=h7b6447c_0
  - ca-certificates=2023.08.22=h06a4308_0
  - ld_impl_linux-64=2.38=h1181459_1
  - libffi=3.4.4=h6a678d5_0
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libstdcxx-ng=11.2.0=h1234567_1
  - libuuid=1.41.5=h5eee18b_0
  - ncurses=6.4=h6a678d5_0
  - openssl=3.0.12=h7f8727e_0
  - pip=23.3.1=py310h06a4308_0
  - python=3.10.13=h955ad1f_0
  - readline=8.2=h5eee18b_0
  - setuptools=68.0.0=py310h06a4308_0
  - sqlite=3.41.2=h5eee18b_0
  - tk=8.6.12=h1ccaba5_0
  - wheel=0.41.2=py310h06a4308_0
  - xz=5.4.2=h5eee18b_0
  - zlib=1.2.13=h5eee18b_0
  - pip:
      - accelerate==0.24.1
      - aiohttp==3.9.1
      - aiosignal==1.3.1
      - anyio==3.7.1
      - appdirs==1.4.4
      - asgiref==3.7.2
      - async-timeout==4.0.3
      - attrs==23.1.0
      - bentoml==1.1.10
      - bitsandbytes==0.41.2.post2
      - bpytop==1.0.68
      - build==0.10.0
      - cattrs==23.1.2
      - certifi==2023.11.17
      - charset-normalizer==3.3.2
      - circus==0.18.0
      - click==8.1.7
      - click-option-group==0.5.6
      - cloudpickle==3.0.0
      - coloredlogs==15.0.1
      - contextlib2==21.6.0
      - cuda-python==12.3.0
      - datasets==2.15.0
      - deepmerge==1.1.0
      - deprecated==1.2.14
      - dill==0.3.7
      - distlib==0.3.7
      - distro==1.8.0
      - einops==0.7.0
      - exceptiongroup==1.2.0
      - fastapi==0.104.1
      - fastcore==1.5.29
      - filelock==3.13.1
      - filetype==1.2.0
      - frozenlist==1.4.0
      - fs==2.4.16
      - fschat==0.2.33
      - fsspec==2023.10.0
      - ghapi==1.0.4
      - h11==0.14.0
      - httpcore==1.0.2
      - httptools==0.6.1
      - httpx==0.25.2
      - huggingface-hub==0.19.4
      - humanfriendly==10.0
      - idna==3.6
      - importlib-metadata==6.8.0
      - inflection==0.5.1
      - jinja2==3.1.2
      - jsonschema==4.20.0
      - jsonschema-specifications==2023.11.1
      - markdown-it-py==3.0.0
      - markdown2==2.4.10
      - markupsafe==2.1.3
      - mdurl==0.1.2
      - mpmath==1.3.0
      - msgpack==1.0.7
      - multidict==6.0.4
      - multiprocess==0.70.15
      - mypy-extensions==1.0.0
      - networkx==3.2.1
      - nh3==0.2.14
      - ninja==1.11.1.1
      - numpy==1.26.2
      - nvidia-cublas-cu12==12.1.3.1
      - nvidia-cuda-cupti-cu12==12.1.105
      - nvidia-cuda-nvrtc-cu12==12.1.105
      - nvidia-cuda-runtime-cu12==12.1.105
      - nvidia-cudnn-cu12==8.9.2.26
      - nvidia-cufft-cu12==11.0.2.54
      - nvidia-curand-cu12==10.3.2.106
      - nvidia-cusolver-cu12==11.4.5.107
      - nvidia-cusparse-cu12==12.1.0.106
      - nvidia-ml-py==11.525.150
      - nvidia-nccl-cu12==2.18.1
      - nvidia-nvjitlink-cu12==12.3.101
      - nvidia-nvtx-cu12==12.1.105
      - openai==1.3.6
      - openllm==0.4.32.dev7
      - openllm-client==0.4.31
      - openllm-core==0.4.32.dev7
      - openllm-monorepo==0.4.32.dev7
      - opentelemetry-api==1.20.0
      - opentelemetry-instrumentation==0.41b0
      - opentelemetry-instrumentation-aiohttp-client==0.41b0
      - opentelemetry-instrumentation-asgi==0.41b0
      - opentelemetry-sdk==1.20.0
      - opentelemetry-semantic-conventions==0.41b0
      - opentelemetry-util-http==0.41b0
      - optimum==1.14.1
      - orjson==3.9.10
      - packaging==23.2
      - pandas==2.1.3
      - pathspec==0.11.2
      - peft==0.6.2
      - pillow==10.1.0
      - pip-requirements-parser==32.0.1
      - pip-tools==7.3.0
      - platformdirs==4.0.0
      - prometheus-client==0.19.0
      - prompt-toolkit==3.0.41
      - protobuf==4.25.1
      - psutil==5.9.6
      - pyarrow==14.0.1
      - pyarrow-hotfix==0.6
      - pydantic==1.10.13
      - pygments==2.17.2
      - pyparsing==3.1.1
      - pyproject-hooks==1.0.0
      - python-dateutil==2.8.2
      - python-dotenv==1.0.0
      - python-json-logger==2.0.7
      - python-multipart==0.0.6
      - pytz==2023.3.post1
      - pyyaml==6.0.1
      - pyzmq==25.1.1
      - ray==2.8.0
      - referencing==0.31.0
      - regex==2023.10.3
      - requests==2.31.0
      - rich==13.7.0
      - rpds-py==0.13.1
      - safetensors==0.4.1
      - schema==0.7.5
      - scipy==1.11.4
      - sentencepiece==0.1.99
      - shortuuid==1.0.11
      - simple-di==0.1.5
      - six==1.16.0
      - sniffio==1.3.0
      - starlette==0.27.0
      - svgwrite==1.4.3
      - sympy==1.12
      - tiktoken==0.5.1
      - tokenizers==0.15.0
      - tomli==2.0.1
      - torch==2.1.0
      - tornado==6.3.3
      - tqdm==4.66.1
      - transformers==4.35.2
      - triton==2.1.0
      - typing-extensions==4.8.0
      - tzdata==2023.3
      - urllib3==2.1.0
      - uvicorn==0.24.0.post1
      - uvloop==0.19.0
      - virtualenv==20.24.7
      - vllm==0.2.2
      - watchfiles==0.21.0
      - wavedrom==2.0.3.post3
      - wcwidth==0.2.12
      - websockets==12.0
      - wrapt==1.16.0
      - xformers==0.0.22.post7
      - xxhash==3.4.1
      - yarl==1.9.3
      - zipp==3.17.0
prefix: /home/ubuntu/miniconda3/envs/inference

pip_packages

accelerate==0.24.1
aiohttp==3.9.1
aiosignal==1.3.1
anyio==3.7.1
appdirs==1.4.4
asgiref==3.7.2
async-timeout==4.0.3
attrs==23.1.0
bentoml==1.1.10
bitsandbytes==0.41.2.post2
bpytop==1.0.68
build==0.10.0
cattrs==23.1.2
certifi==2023.11.17
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloudpickle==3.0.0
coloredlogs==15.0.1
contextlib2==21.6.0
cuda-python==12.3.0
datasets==2.15.0
deepmerge==1.1.0
Deprecated==1.2.14
dill==0.3.7
distlib==0.3.7
distro==1.8.0
einops==0.7.0
exceptiongroup==1.2.0
fastapi==0.104.1
fastcore==1.5.29
filelock==3.13.1
filetype==1.2.0
frozenlist==1.4.0
fs==2.4.16
fschat==0.2.33
fsspec==2023.10.0
ghapi==1.0.4
h11==0.14.0
httpcore==1.0.2
httptools==0.6.1
httpx==0.25.2
huggingface-hub==0.19.4
humanfriendly==10.0
idna==3.6
importlib-metadata==6.8.0
inflection==0.5.1
Jinja2==3.1.2
jsonschema==4.20.0
jsonschema-specifications==2023.11.1
markdown-it-py==3.0.0
markdown2==2.4.10
MarkupSafe==2.1.3
mdurl==0.1.2
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.4
multiprocess==0.70.15
mypy-extensions==1.0.0
networkx==3.2.1
nh3==0.2.14
ninja==1.11.1.1
numpy==1.26.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==11.525.150
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
openai==1.3.6
-e git+https://github.com/bentoml/OpenLLM.git@0ce7782c2c97ffe7f0b7c724c8471f5523d285d2#egg=openllm&subdirectory=openllm-python
openllm-client==0.4.31
-e git+https://github.com/bentoml/OpenLLM.git@0ce7782c2c97ffe7f0b7c724c8471f5523d285d2#egg=openllm_core&subdirectory=openllm-core
-e git+https://github.com/bentoml/OpenLLM.git@0ce7782c2c97ffe7f0b7c724c8471f5523d285d2#egg=openllm_monorepo
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
optimum==1.14.1
orjson==3.9.10
packaging==23.2
pandas==2.1.3
pathspec==0.11.2
peft==0.6.2
Pillow==10.1.0
pip-requirements-parser==32.0.1
pip-tools==7.3.0
platformdirs==4.0.0
prometheus-client==0.19.0
prompt-toolkit==3.0.41
protobuf==4.25.1
psutil==5.9.6
pyarrow==14.0.1
pyarrow-hotfix==0.6
pydantic==1.10.13
Pygments==2.17.2
pyparsing==3.1.1
pyproject_hooks==1.0.0
python-dateutil==2.8.2
python-dotenv==1.0.0
python-json-logger==2.0.7
python-multipart==0.0.6
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.1
ray==2.8.0
referencing==0.31.0
regex==2023.10.3
requests==2.31.0
rich==13.7.0
rpds-py==0.13.1
safetensors==0.4.1
schema==0.7.5
scipy==1.11.4
sentencepiece==0.1.99
shortuuid==1.0.11
simple-di==0.1.5
six==1.16.0
sniffio==1.3.0
starlette==0.27.0
svgwrite==1.4.3
sympy==1.12
tiktoken==0.5.1
tokenizers==0.15.0
tomli==2.0.1
torch==2.1.0
tornado==6.3.3
tqdm==4.66.1
transformers==4.35.2
triton==2.1.0
typing_extensions==4.8.0
tzdata==2023.3
urllib3==2.1.0
uvicorn==0.24.0.post1
uvloop==0.19.0
virtualenv==20.24.7
vllm==0.2.2
watchfiles==0.21.0
wavedrom==2.0.3.post3
wcwidth==0.2.12
websockets==12.0
wrapt==1.16.0
xformers==0.0.22.post7
xxhash==3.4.1
yarl==1.9.3
zipp==3.17.0

System information (Optional)

No response

Nov 29 '23 17:11 fmocking

Hi there, this chat_template is not being used for the chat completion endpoint yet. For now, we will just depends on the model default chat_template that tokenizer.apply_chat_template uses.

I don't think modifying chat templates should be on the fly, rather a head of time op.

Nov 30 '23 11:11 aarnphm

OpenLLM OpenLLM copied to clipboard

bug: Chat template is not applied

Describe the bug

To reproduce

Logs

Environment

System information

System information (Optional)

OpenLLM
OpenLLM copied to clipboard