mlc-llm
mlc-llm copied to clipboard
[Question] While waiting for the model's response on an Android phone, performing other operations may cause the phone to become unresponsive or reboot.
❓ General Questions
While waiting for the model's response on an Android phone, performing other operations may cause the phone to become unresponsive or reboot. For example, if I want to return to the home screen.
I suspect that it's due to insufficient GPU resources on the device. Trying to use only the CPU results in the app crashing.
2025-03-04 15:03:37.647 19380-19447/ai.mlc.mlcchat E/AndroidRuntime: FATAL EXCEPTION: Thread-5 Process: ai.mlc.mlcchat, PID: 19380 org.apache.tvm.Base$TVMError: TVMError: Assert fail: T.tvm_struct_get(p_model_embed_tokens_q_weight, 0, 10, "int32") == 4, Argument qwen2_q4f16_1_e396fd42f6a997ca798eafc3bf56647f_fused_dequantize_take1.p_model_embed_tokens_q_weight.device_type has an unsatisfied constraint: 4 == T.tvm_struct_get(p_model_embed_tokens_q_weight, 0, 10, "int32")
at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.JSONFFIEngine.runBackgroundLoop(JSONFFIEngine.java:65)
at ai.mlc.mlcllm.MLCEngine$backgroundWorker$1.invoke(MLCEngine.kt:42)
at ai.mlc.mlcllm.MLCEngine$backgroundWorker$1.invoke(MLCEngine.kt:40)
at ai.mlc.mlcllm.BackgroundWorker$start$1.invoke(MLCEngine.kt:19)
at ai.mlc.mlcllm.BackgroundWorker$start$1.invoke(MLCEngine.kt:18)
at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)