mlc-llm
mlc-llm copied to clipboard
[Bug] When I enable "<|im_end|>" as stop_str in qwen2 configuration, the final output seems to be truncated.
🐛 Bug
To Reproduce
Steps to reproduce the behavior:
1.Do not set <|im_end|>. Of course, my fine-tuned qwen2 model will output <im_end>. But the problem is that it is not a separate token (Id 151645 is <im_end>), it is a token together with the last part (} in JSON) of my expected output,In my case it is }<im
2.Then set <|im_end|>. The last part of my expected output is stopped along with <im_end>, so the final output was missing the }
Expected behavior
Environment
- Platform :CUDA
- Operating system :Ubuntu
- Device :PC+Tesla V100
- How you installed MLC-LLM (
conda, source):source - How you installed TVM-Unity (
pip, source):source - Python version (e.g. 3.10):3.11.8
- GPU driver version (if applicable):535.54.03
- CUDA/cuDNN version (if applicable):12.1
- TVM Unity Hash Tag :69190c360cd5ce1c4a35c0f49501c96993fae416
- Any other relevant information: