mlc-llm [Bug] long respone lead to Android APP no response

[Bug] long respone lead to Android APP no response

Open qc903113684 opened this issue 1 year ago • 0 comments

🐛 Bug

Long respone from llama-2-7b lead to Android APP no response. When I ask "what is qualcomm", Llama-2 will respone a very long content. After that, when I ask another question will get APP no response.

To Reproduce

Steps to reproduce the behavior:

build mlc-llm project (commit https://github.com/mlc-ai/mlc-llm/commit/02a41e1fe4918b0c313ce24a532adc6eaed6ae02)
build tvm unity environment (mlc-ai-nightly-cu118 0.12.dev1880)
compile llama-2 in q4f16_0 --max-seq-len 768
build andoid app
ask question "what is qualcomm", wait long response finished.
ask "what is qualcomm" again will get app no response.

the error message showed below

Expected behavior

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Android
Operating system (e.g. Ubuntu/Windows/MacOS/...):
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) 8gen2 android phone with 12GB RAM
How you installed MLC-LLM (conda, source): source
How you installed TVM-Unity (pip, source): pip
Python version (e.g. 3.10): 3.8
GPU driver version (if applicable): 11.8
CUDA/cuDNN version (if applicable):
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): 189412e9ad52fee4dc3dc46bcf60d820e82422d8
Any other relevant information: before chat with model i got this error: A/TVM_RUNTIME: /home/chaoqin/aidchat/mlc-llm/cpp/model_metadata.cc:37: InternalError: Check failed: (pf != nullptr) is false: but model can work normallly

Additional context

Nov 30 '23 04:11 qc903113684

mlc-llm mlc-llm copied to clipboard

[Bug] long respone lead to Android APP no response

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

mlc-llm
mlc-llm copied to clipboard