mlc-llm
mlc-llm copied to clipboard
[Bug] long respone lead to Android APP no response
🐛 Bug
Long respone from llama-2-7b lead to Android APP no response. When I ask "what is qualcomm", Llama-2 will respone a very long content. After that, when I ask another question will get APP no response.
To Reproduce
Steps to reproduce the behavior:
- build mlc-llm project (commit https://github.com/mlc-ai/mlc-llm/commit/02a41e1fe4918b0c313ce24a532adc6eaed6ae02)
- build tvm unity environment (mlc-ai-nightly-cu118 0.12.dev1880)
- compile llama-2 in q4f16_0 --max-seq-len 768
- build andoid app
- ask question "what is qualcomm", wait long response finished.
- ask "what is qualcomm" again will get app no response.
the error message showed below
Expected behavior
Environment
- Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Android
- Operating system (e.g. Ubuntu/Windows/MacOS/...):
- Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) 8gen2 android phone with 12GB RAM
- How you installed MLC-LLM (
conda
, source): source - How you installed TVM-Unity (
pip
, source): pip - Python version (e.g. 3.10): 3.8
- GPU driver version (if applicable): 11.8
- CUDA/cuDNN version (if applicable):
- TVM Unity Hash Tag (
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models): 189412e9ad52fee4dc3dc46bcf60d820e82422d8 - Any other relevant information: before chat with model i got this error: A/TVM_RUNTIME: /home/chaoqin/aidchat/mlc-llm/cpp/model_metadata.cc:37: InternalError: Check failed: (pf != nullptr) is false: but model can work normallly