mlc-llm
mlc-llm copied to clipboard
[Question] Deepseek R1 Distill Qwen 1.5B converted models have very large VRAM requirement.
I checked multiple converted deepseek r1 distill qwen 1.5B models on MLCChat app on iPhone 15 Plus and Google Pixel 8 pro. But all of them have a very high GPU memory requirement due to which it fails on iOS and Android both.
I tried with 3 models https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q4f16_1-MLC https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q0f16-MLC https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q4f32_1-MLC
Is there a way to make this run on smartphone?
FATAL EXCEPTION: Thread-4 Process: ai.mlc.mlcchat, PID: 14195 org.apache.tvm.Base$TVMError: TVMError: Check failed: (output_res.IsOk()) is false: Insufficient GPU memory error: The available single GPU memory is 4352.000 MB, which is less than the sum of model weight size (1059.693 MB) and temporary buffer size (11891.183 MB).