When to release the QNN model for phi-4-mini-instruct
We need to use the QNN version of the Phi-4-mini-instruct model. I have tried using ONNX and Qualcomm's examples, but it seems that we cannot convert it to the QNN version. Can you provide an example of converting the model When will Foundry Local provide it
What device are you running on @qihui-liu?
@natke Sorry, I forgot to provide device information. Processor: Snapdragon (R) X Elite - X1E80100- Qualcomm (R) Oryon (TM) CPU (3.42 GHz) RAM:16G System type: 64 bit operating system, ARM based processor Operating System: Windows 11 Home Edition Insider Preview Experience: Windows Feature Experience Pack 1000.26100.154.0
Thank you, and can you please provide the version of foundry local you are running (foundry --version) and what you see when you run foundry model list @qihui-liu
The local version of the foundation is 0.4.91+269dfd9ed1
Can you please share the output from
foundry model list
Is this it? @natke
I have the same CPU and my output is as follows: (much more limited than on Intel platform)
PS C:\Users\woute> foundry model list
Alias Device Task File Size License Model ID
-----------------------------------------------------------------------------------------------
phi-4 CPU chat-completion 10.16 GB MIT Phi-4-generic-cpu
--------------------------------------------------------------------------------------------------------
phi-3.5-mini CPU chat-completion 2.53 GB MIT Phi-3.5-mini-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b NPU chat-completion 7.12 GB MIT deepseek-r1-distill-qwen-14b-qnn-npu
---------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b NPU chat-completion 3.71 GB MIT deepseek-r1-distill-qwen-7b-qnn-npu
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k CPU chat-completion 2.54 GB MIT Phi-3-mini-128k-instruct-generic-cpu
---------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k CPU chat-completion 2.53 GB MIT Phi-3-mini-4k-instruct-generic-cpu
-------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2 CPU chat-completion 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-cpu
-------------------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning NPU chat-completion 2.78 GB MIT Phi-4-mini-reasoning-qnn-npu
CPU chat-completion 4.52 GB MIT Phi-4-mini-reasoning-generic-cpu
-----------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-0.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-1.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-cpu
----------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-14b-instruct-generic-cpu
-----------------------------------------------------------------------------------------------------------------------
qwen2.5-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-7b-instruct-generic-cpu
----------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-cpu
PS C:\Users\woute> foundry --version
0.4.91+269dfd9ed1
PS C:\WINDOWS\system32> foundry model list 🟢 Service is Started on http://localhost:5273, PID 19296! Alias Device Task File Size License Model ID
phi-4 GPU chat-completion 8.37 GB MIT Phi-4-generic-gpu CPU chat-completion 10.16 GB MIT Phi-4-generic-cpu
mistral-7b-v0.2 GPU chat-completion 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-gpu CPU chat-completion 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-cpu
phi-3.5-mini GPU chat-completion 2.16 GB MIT Phi-3.5-mini-instruct-generic-gpu CPU chat-completion 2.53 GB MIT Phi-3.5-mini-instruct-generic-cpu
phi-3-mini-128k GPU chat-completion 2.13 GB MIT Phi-3-mini-128k-instruct-generic-gpu CPU chat-completion 2.54 GB MIT Phi-3-mini-128k-instruct-generic-cpu
phi-3-mini-4k GPU chat-completion 2.13 GB MIT Phi-3-mini-4k-instruct-generic-gpu CPU chat-completion 2.53 GB MIT Phi-3-mini-4k-instruct-generic-cpu
deepseek-r1-14b GPU chat-completion 10.27 GB MIT deepseek-r1-distill-qwen-14b-generic-gpu CPU chat-completion 11.51 GB MIT deepseek-r1-distill-qwen-14b-generic-cpu
deepseek-r1-7b GPU chat-completion 5.58 GB MIT deepseek-r1-distill-qwen-7b-generic-gpu CPU chat-completion 6.43 GB MIT deepseek-r1-distill-qwen-7b-generic-cpu
qwen2.5-0.5b GPU chat-completion 0.68 GB apache-2.0 qwen2.5-0.5b-instruct-generic-gpu CPU chat-completion 0.80 GB apache-2.0 qwen2.5-0.5b-instruct-generic-cpu
qwen2.5-1.5b GPU chat-completion 1.51 GB apache-2.0 qwen2.5-1.5b-instruct-generic-gpu CPU chat-completion 1.78 GB apache-2.0 qwen2.5-1.5b-instruct-generic-cpu
qwen2.5-coder-0.5b GPU chat-completion 0.52 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-gpu CPU chat-completion 0.80 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-cpu
qwen2.5-coder-7b GPU chat-completion 4.73 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-gpu CPU chat-completion 6.16 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-cpu
qwen2.5-coder-1.5b GPU chat-completion 1.25 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-gpu CPU chat-completion 1.78 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-cpu
phi-4-mini GPU chat-completion 3.72 GB MIT Phi-4-mini-instruct-generic-gpu CPU chat-completion 4.80 GB MIT Phi-4-mini-instruct-generic-cpu
phi-4-mini-reasoning GPU chat-completion 3.15 GB MIT Phi-4-mini-reasoning-generic-gpu CPU chat-completion 4.52 GB MIT Phi-4-mini-reasoning-generic-cpu
qwen2.5-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-14b-instruct-generic-cpu
qwen2.5-7b GPU chat-completion 5.20 GB apache-2.0 qwen2.5-7b-instruct-generic-gpu CPU chat-completion 6.16 GB apache-2.0 qwen2.5-7b-instruct-generic-cpu
qwen2.5-coder-14b GPU chat-completion 8.79 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-gpu CPU chat-completion 11.06 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-cpu PS C:\WINDOWS\system32> foundry --version 0.4.91+269dfd9ed1
@wldevries @natke Hi, Both of you, I am still waiting for your reply. I really need this model, it's related to my work.
I would like it as well but not as badly as you. But I dont work at Microsoft and can't help you.