Foundry-Local icon indicating copy to clipboard operation
Foundry-Local copied to clipboard

When to release the QNN model for phi-4-mini-instruct

Open nnbw-liu opened this issue 5 months ago • 10 comments

We need to use the QNN version of the Phi-4-mini-instruct model. I have tried using ONNX and Qualcomm's examples, but it seems that we cannot convert it to the QNN version. Can you provide an example of converting the model When will Foundry Local provide it

nnbw-liu avatar Jul 04 '25 02:07 nnbw-liu

What device are you running on @qihui-liu?

natke avatar Jul 07 '25 22:07 natke

@natke Sorry, I forgot to provide device information. Processor: Snapdragon (R) X Elite - X1E80100- Qualcomm (R) Oryon (TM) CPU (3.42 GHz) RAM:16G System type: 64 bit operating system, ARM based processor Operating System: Windows 11 Home Edition Insider Preview Experience: Windows Feature Experience Pack 1000.26100.154.0

nnbw-liu avatar Jul 08 '25 01:07 nnbw-liu

Thank you, and can you please provide the version of foundry local you are running (foundry --version) and what you see when you run foundry model list @qihui-liu

natke avatar Jul 08 '25 16:07 natke

The local version of the foundation is 0.4.91+269dfd9ed1

Image Image Image

nnbw-liu avatar Jul 09 '25 06:07 nnbw-liu

Can you please share the output from

foundry model list

natke avatar Jul 09 '25 16:07 natke

Is this it? @natke Image

nnbw-liu avatar Jul 10 '25 10:07 nnbw-liu

I have the same CPU and my output is as follows: (much more limited than on Intel platform)

PS C:\Users\woute> foundry model list
Alias                          Device     Task               File Size    License      Model ID
-----------------------------------------------------------------------------------------------
phi-4                          CPU        chat-completion    10.16 GB     MIT          Phi-4-generic-cpu
--------------------------------------------------------------------------------------------------------
phi-3.5-mini                   CPU        chat-completion    2.53 GB      MIT          Phi-3.5-mini-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b                NPU        chat-completion    7.12 GB      MIT          deepseek-r1-distill-qwen-14b-qnn-npu
---------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b                 NPU        chat-completion    3.71 GB      MIT          deepseek-r1-distill-qwen-7b-qnn-npu
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k                CPU        chat-completion    2.54 GB      MIT          Phi-3-mini-128k-instruct-generic-cpu
---------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k                  CPU        chat-completion    2.53 GB      MIT          Phi-3-mini-4k-instruct-generic-cpu
-------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2                CPU        chat-completion    4.07 GB      apache-2.0   mistralai-Mistral-7B-Instruct-v0-2-generic-cpu
-------------------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning           NPU        chat-completion    2.78 GB      MIT          Phi-4-mini-reasoning-qnn-npu
                               CPU        chat-completion    4.52 GB      MIT          Phi-4-mini-reasoning-generic-cpu
-----------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b                   CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-0.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b                   CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-1.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b             CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-coder-0.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b               CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-coder-7b-instruct-generic-cpu
----------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b             CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-coder-1.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b                    CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-14b-instruct-generic-cpu
-----------------------------------------------------------------------------------------------------------------------
qwen2.5-7b                     CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-7b-instruct-generic-cpu
----------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b              CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-coder-14b-instruct-generic-cpu
PS C:\Users\woute> foundry --version
0.4.91+269dfd9ed1

wldevries avatar Jul 22 '25 21:07 wldevries

PS C:\WINDOWS\system32> foundry model list 🟢 Service is Started on http://localhost:5273, PID 19296! Alias Device Task File Size License Model ID


phi-4 GPU chat-completion 8.37 GB MIT Phi-4-generic-gpu CPU chat-completion 10.16 GB MIT Phi-4-generic-cpu

mistral-7b-v0.2 GPU chat-completion 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-gpu CPU chat-completion 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-cpu

phi-3.5-mini GPU chat-completion 2.16 GB MIT Phi-3.5-mini-instruct-generic-gpu CPU chat-completion 2.53 GB MIT Phi-3.5-mini-instruct-generic-cpu

phi-3-mini-128k GPU chat-completion 2.13 GB MIT Phi-3-mini-128k-instruct-generic-gpu CPU chat-completion 2.54 GB MIT Phi-3-mini-128k-instruct-generic-cpu

phi-3-mini-4k GPU chat-completion 2.13 GB MIT Phi-3-mini-4k-instruct-generic-gpu CPU chat-completion 2.53 GB MIT Phi-3-mini-4k-instruct-generic-cpu

deepseek-r1-14b GPU chat-completion 10.27 GB MIT deepseek-r1-distill-qwen-14b-generic-gpu CPU chat-completion 11.51 GB MIT deepseek-r1-distill-qwen-14b-generic-cpu

deepseek-r1-7b GPU chat-completion 5.58 GB MIT deepseek-r1-distill-qwen-7b-generic-gpu CPU chat-completion 6.43 GB MIT deepseek-r1-distill-qwen-7b-generic-cpu

qwen2.5-0.5b GPU chat-completion 0.68 GB apache-2.0 qwen2.5-0.5b-instruct-generic-gpu CPU chat-completion 0.80 GB apache-2.0 qwen2.5-0.5b-instruct-generic-cpu

qwen2.5-1.5b GPU chat-completion 1.51 GB apache-2.0 qwen2.5-1.5b-instruct-generic-gpu CPU chat-completion 1.78 GB apache-2.0 qwen2.5-1.5b-instruct-generic-cpu

qwen2.5-coder-0.5b GPU chat-completion 0.52 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-gpu CPU chat-completion 0.80 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-cpu

qwen2.5-coder-7b GPU chat-completion 4.73 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-gpu CPU chat-completion 6.16 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-cpu

qwen2.5-coder-1.5b GPU chat-completion 1.25 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-gpu CPU chat-completion 1.78 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-cpu

phi-4-mini GPU chat-completion 3.72 GB MIT Phi-4-mini-instruct-generic-gpu CPU chat-completion 4.80 GB MIT Phi-4-mini-instruct-generic-cpu

phi-4-mini-reasoning GPU chat-completion 3.15 GB MIT Phi-4-mini-reasoning-generic-gpu CPU chat-completion 4.52 GB MIT Phi-4-mini-reasoning-generic-cpu

qwen2.5-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-14b-instruct-generic-cpu

qwen2.5-7b GPU chat-completion 5.20 GB apache-2.0 qwen2.5-7b-instruct-generic-gpu CPU chat-completion 6.16 GB apache-2.0 qwen2.5-7b-instruct-generic-cpu

qwen2.5-coder-14b GPU chat-completion 8.79 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-gpu CPU chat-completion 11.06 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-cpu PS C:\WINDOWS\system32> foundry --version 0.4.91+269dfd9ed1

nnbw-liu avatar Jul 23 '25 02:07 nnbw-liu

@wldevries @natke Hi, Both of you, I am still waiting for your reply. I really need this model, it's related to my work.

nnbw-liu avatar Aug 07 '25 08:08 nnbw-liu

I would like it as well but not as badly as you. But I dont work at Microsoft and can't help you.

wldevries avatar Aug 07 '25 08:08 wldevries