mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

[Bug] long respone lead to Android APP no response

Open qc903113684 opened this issue 1 year ago • 0 comments

🐛 Bug

Long respone from llama-2-7b lead to Android APP no response. When I ask "what is qualcomm", Llama-2 will respone a very long content. After that, when I ask another question will get APP no response.

To Reproduce

Steps to reproduce the behavior:

  1. build mlc-llm project (commit https://github.com/mlc-ai/mlc-llm/commit/02a41e1fe4918b0c313ce24a532adc6eaed6ae02)
  2. build tvm unity environment (mlc-ai-nightly-cu118 0.12.dev1880)
  3. compile llama-2 in q4f16_0 --max-seq-len 768
  4. build andoid app
  5. ask question "what is qualcomm", wait long response finished.
  6. ask "what is qualcomm" again will get app no response.

the error message showed below

image

Expected behavior

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Android
  • Operating system (e.g. Ubuntu/Windows/MacOS/...):
  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) 8gen2 android phone with 12GB RAM
  • How you installed MLC-LLM (conda, source): source
  • How you installed TVM-Unity (pip, source): pip
  • Python version (e.g. 3.10): 3.8
  • GPU driver version (if applicable): 11.8
  • CUDA/cuDNN version (if applicable):
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): 189412e9ad52fee4dc3dc46bcf60d820e82422d8
  • Any other relevant information: before chat with model i got this error: A/TVM_RUNTIME: /home/chaoqin/aidchat/mlc-llm/cpp/model_metadata.cc:37: InternalError: Check failed: (pf != nullptr) is false: but model can work normallly

Additional context

qc903113684 avatar Nov 30 '23 04:11 qc903113684