mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

[Bug] mlc_llm generates randomly corrupted Unicode character when outputting Chinese

Open LuRenJiasWorld opened this issue 1 year ago • 5 comments

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. Install latest mlc-llm and mlc-ai in conda with python 3.12, running on an Apple Silicon (M1 Pro) MacBook Pro with 32 GiB of RAM
  2. Download Qwen-2-7b-MLC Model from https://huggingface.co/mlc-ai/Qwen2-7B-Instruct-q4f16_1-MLC (Other LLMs can also reproduce this issue)
  3. Using mlc_llm serve Qwen2-7B-Instruct-q4f16_1-MLC --host 0.0.0.0 to run the server (mlc_llm chat can also reproduce this issue)
  4. In any application that can produce many outputs (for example immersive translate working with OpenAI compatible API), I can see the following result, which contains many corrupted Chinese character. wecom-temp-210601-c6c28ad642b7a653de81603bb9ae5509 wecom-temp-122688-1b8ddf75ef55a6ac8030475ebdb13170 wecom-temp-112502-31845a08e6749aa4f2d3734cd5a5ad5d

When I use a Linux server with Nvidia L20 GPU, by using the same model, same application, same prompt, I could also reproduce this issue, but not as frequently as MacBook does.

image

Expected behavior

There should not have corrupted Unicode character when outputting Chinese, which is frustrating, makes me frequently guess what the word should be.

LuRenJiasWorld avatar Aug 22 '24 05:08 LuRenJiasWorld