mlc-llm [Question] Deepseek R1 Distill Qwen 1.5B converted models have very large VRAM requirement.

[Question] Deepseek R1 Distill Qwen 1.5B converted models have very large VRAM requirement.

Open bhushangawde opened this issue 9 months ago • 10 comments

I checked multiple converted deepseek r1 distill qwen 1.5B models on MLCChat app on iPhone 15 Plus and Google Pixel 8 pro. But all of them have a very high GPU memory requirement due to which it fails on iOS and Android both.

I tried with 3 models https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q4f16_1-MLC https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q0f16-MLC https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q4f32_1-MLC

Is there a way to make this run on smartphone?

FATAL EXCEPTION: Thread-4 Process: ai.mlc.mlcchat, PID: 14195 org.apache.tvm.Base$TVMError: TVMError: Check failed: (output_res.IsOk()) is false: Insufficient GPU memory error: The available single GPU memory is 4352.000 MB, which is less than the sum of model weight size (1059.693 MB) and temporary buffer size (11891.183 MB).

Jan 28 '25 10:01 bhushangawde

mlc-llm mlc-llm copied to clipboard

[Question] Deepseek R1 Distill Qwen 1.5B converted models have very large VRAM requirement.

mlc-llm
mlc-llm copied to clipboard