Issues on Intel NPU and OpenVINO EP
Problems on Intel NPU and Foundry Local
##Background I send this in discussion too.And i ask AI about this, AI think this is an issue, but not my fault, so I send it to issue.
Hi there! I'm trying to use Intel NPU and Foundry Local My NPU has already installed driver successfully.
foundry model run DeepSeek-R1-Distill-Qwen-7B-openvino-npu:1
I tried to run it, but
C:\Users\(hidden)\.foundry>foundry model run DeepSeek-R1-Distill-Qwen-7B-openvino-npu:1
Exception: ❌ Model <DeepSeek-R1-Distill-Qwen-7B-openvino-npu:1> was not found in the catalog or local cache.
🔍 Did you mean: `foundry model run deepseek-r1-distill-qwen-7b-generic-gpu:3`?
👉 Try `foundry model list` to see available models.
👉 Or check for typos in the model name.
[12:37:58 ERR] LogException
Microsoft.AI.Foundry.Local.Common.FLException: ❌ Model <DeepSeek-R1-Distill-Qwen-7B-openvino-npu:1> was not found in the catalog or local cache.
🔍 Did you mean: `foundry model run deepseek-r1-distill-qwen-7b-generic-gpu:3`?
👉 Try `foundry model list` to see available models.
👉 Or check for typos in the model name.
at Microsoft.AI.Foundry.Local.Catalog.AzureFoundryCatalog.<SuggestFuzzyMatch>d__11.MoveNext() + 0xd1c
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.Common.ModelManagement.<DownloadModelAsync>d__9.MoveNext() + 0x383
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.Commands.ModelRunCommand.<<Create>b__1_0>d.MoveNext() + 0x5d6
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.Common.CommandActionFactory.<>c__DisplayClass0_0`1.<<Create>b__0>d.MoveNext() + 0x1e7
--- End of stack trace from previous location ---
at System.CommandLine.NamingConventionBinder.CommandHandler.<GetExitCodeAsync>d__66.MoveNext() + 0x50
--- End of stack trace from previous location ---
at System.CommandLine.NamingConventionBinder.ModelBindingCommandHandler.<InvokeAsync>d__11.MoveNext() + 0x61
--- End of stack trace from previous location ---
at System.CommandLine.Invocation.InvocationPipeline.<InvokeAsync>d__0.MoveNext() + 0x1cd
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.Program.<Main>d__1.MoveNext() + 0x4e4
C:\Users\(Hidden)\.foundry>
Too bad!
Another problem (maybe the same problem?) is I tried to show out if I can list out all the NPU-supported models But
C:\Users\(hidden)\.foundry>foundry model list --filter device=NPU
🕚 Downloading complete!...
Failed to download or register the following EPs: OpenVINOExecutionProvider. Will try installing again later.
Valid EPs: CPUExecutionProvider, WebGpuExecutionProvider
Alias Device Task File Size License Model ID
C:\Users\(Hidden)\.foundry>
The problem is: Failed to download or register the following EPs
I don't know if it's a problem or I did something wrong. So I didn't submit this at issue. If you know what can I do, comment please. Very THANK YOU!
Environment
CPU = Intel Core Ultra 7 155H Mem = 32GB NPU = Intel AI Boost, DriverV32.0.100.4404, DriverTime2025/10/26 GPU = Intel Arc Graphics System = Windows 11 24H2 Region=HongKong,China
I had download an OpenVINO toolkit from Intel and added lib&release&debug folders in it to Path.
My computer supports python. OpenVINO of Python3.12.8 can detect NPU correctly.
I downgrade to 0.7.120, it can be used correctly. I can use NPU.
Thank you for reporting this. Can you please share the output of foundry model list with the version of Foundry Local that is not working. Also, are you running on latest Windows, or Windows Insider Program?
Reply
About foundry model list
C:\Users\(hidden)>foundry model list
Alias Device Task File Size License Model ID
-----------------------------------------------------------------------------------------------
phi-4 GPU chat-completion 8.37 GB MIT Phi-4-generic-gpu:1
CPU chat-completion 10.16 GB MIT Phi-4-generic-cpu:1
----------------------------------------------------------------------------------------------------------
phi-3.5-mini GPU chat-completion 2.16 GB MIT Phi-3.5-mini-instruct-generic-gpu:1
CPU chat-completion 2.53 GB MIT Phi-3.5-mini-instruct-generic-cpu:1
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k GPU chat-completion 2.13 GB MIT Phi-3-mini-128k-instruct-generic-gpu:1
CPU chat-completion 2.54 GB MIT Phi-3-mini-128k-instruct-generic-cpu:2
-----------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k GPU chat-completion 2.13 GB MIT Phi-3-mini-4k-instruct-generic-gpu:1
CPU chat-completion 2.53 GB MIT Phi-3-mini-4k-instruct-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2 GPU chat-completion 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-gpu:1
CPU chat-completion 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b GPU chat-completion 10.27 GB MIT deepseek-r1-distill-qwen-14b-generic-gpu:3
CPU chat-completion 11.51 GB MIT deepseek-r1-distill-qwen-14b-generic-cpu:3
---------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b GPU chat-completion 5.58 GB MIT deepseek-r1-distill-qwen-7b-generic-gpu:3
CPU chat-completion 6.43 GB MIT deepseek-r1-distill-qwen-7b-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b GPU chat-completion 0.52 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-gpu:4
CPU chat-completion 0.80 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-cpu:4
--------------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning GPU chat-completion 3.15 GB MIT Phi-4-mini-reasoning-generic-gpu:3
CPU chat-completion 4.52 GB MIT Phi-4-mini-reasoning-generic-cpu:3
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b GPU chat-completion 0.68 GB apache-2.0 qwen2.5-0.5b-instruct-generic-gpu:4
CPU chat-completion 0.80 GB apache-2.0 qwen2.5-0.5b-instruct-generic-cpu:4
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b GPU chat-completion 1.51 GB apache-2.0 qwen2.5-1.5b-instruct-generic-gpu:4
CPU chat-completion 1.78 GB apache-2.0 qwen2.5-1.5b-instruct-generic-cpu:4
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b GPU chat-completion 1.25 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-gpu:4
CPU chat-completion 1.78 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-cpu:4
--------------------------------------------------------------------------------------------------------------------------------
phi-4-mini GPU chat-completion 3.72 GB MIT Phi-4-mini-instruct-generic-gpu:5
CPU chat-completion 4.80 GB MIT Phi-4-mini-instruct-generic-cpu:5
------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b GPU chat-completion 9.30 GB apache-2.0 qwen2.5-14b-instruct-generic-gpu:4
CPU chat-completion 11.06 GB apache-2.0 qwen2.5-14b-instruct-generic-cpu:4
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b GPU chat-completion 8.79 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-gpu:4
CPU chat-completion 11.06 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-cpu:4
-------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b GPU chat-completion 4.73 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-gpu:4
CPU chat-completion 6.16 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-cpu:4
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-7b GPU chat-completion 5.20 GB apache-2.0 qwen2.5-7b-instruct-generic-gpu:4
CPU chat-completion 6.16 GB apache-2.0 qwen2.5-7b-instruct-generic-cpu:4
Foundry Model List
| Alias | Device | Task | File Size | License | Model ID |
|---|---|---|---|---|---|
| phi-4 | GPU | chat-completion | 8.37 GB | MIT | Phi-4-generic-gpu:1 |
| CPU | chat-completion | 10.16 GB | MIT | Phi-4-generic-cpu:1 | |
| phi-3.5-mini | GPU | chat-completion | 2.16 GB | MIT | Phi-3.5-mini-instruct-generic-gpu:1 |
| CPU | chat-completion | 2.53 GB | MIT | Phi-3.5-mini-instruct-generic-cpu:1 | |
| phi-3-mini-128k | GPU | chat-completion | 2.13 GB | MIT | Phi-3-mini-128k-instruct-generic-gpu:1 |
| CPU | chat-completion | 2.54 GB | MIT | Phi-3-mini-128k-instruct-generic-cpu:2 | |
| phi-3-mini-4k | GPU | chat-completion | 2.13 GB | MIT | Phi-3-mini-4k-instruct-generic-gpu:1 |
| CPU | chat-completion | 2.53 GB | MIT | Phi-3-mini-4k-instruct-generic-cpu:2 | |
| mistral-7b-v0.2 | GPU | chat-completion | 4.07 GB | apache-2.0 | mistralai-Mistral-7B-Instruct-v0-2-generic-gpu:1 |
| CPU | chat-completion | 4.07 GB | apache-2.0 | mistralai-Mistral-7B-Instruct-v0-2-generic-cpu:2 | |
| deepseek-r1-14b | GPU | chat-completion | 10.27 GB | MIT | deepseek-r1-distill-qwen-14b-generic-gpu:3 |
| CPU | chat-completion | 11.51 GB | MIT | deepseek-r1-distill-qwen-14b-generic-cpu:3 | |
| deepseek-r1-7b | GPU | chat-completion | 5.58 GB | MIT | deepseek-r1-distill-qwen-7b-generic-gpu:3 |
| CPU | chat-completion | 6.43 GB | MIT | deepseek-r1-distill-qwen-7b-generic-cpu:3 | |
| qwen2.5-coder-0.5b | GPU | chat-completion | 0.52 GB | apache-2.0 | qwen2.5-coder-0.5b-instruct-generic-gpu:4 |
| CPU | chat-completion | 0.80 GB | apache-2.0 | qwen2.5-coder-0.5b-instruct-generic-cpu:4 | |
| phi-4-mini-reasoning | GPU | chat-completion | 3.15 GB | MIT | Phi-4-mini-reasoning-generic-gpu:3 |
| CPU | chat-completion | 4.52 GB | MIT | Phi-4-mini-reasoning-generic-cpu:3 | |
| qwen2.5-0.5b | GPU | chat-completion | 0.68 GB | apache-2.0 | qwen2.5-0.5b-instruct-generic-gpu:4 |
| CPU | chat-completion | 0.80 GB | apache-2.0 | qwen2.5-0.5b-instruct-generic-cpu:4 | |
| qwen2.5-1.5b | GPU | chat-completion | 1.51 GB | apache-2.0 | qwen2.5-1.5b-instruct-generic-gpu:4 |
| CPU | chat-completion | 1.78 GB | apache-2.0 | qwen2.5-1.5b-instruct-generic-cpu:4 | |
| qwen2.5-coder-1.5b | GPU | chat-completion | 1.25 GB | apache-2.0 | qwen2.5-coder-1.5b-instruct-generic-gpu:4 |
| CPU | chat-completion | 1.78 GB | apache-2.0 | qwen2.5-coder-1.5b-instruct-generic-cpu:4 | |
| phi-4-mini | GPU | chat-completion | 3.72 GB | MIT | Phi-4-mini-instruct-generic-gpu:5 |
| CPU | chat-completion | 4.80 GB | MIT | Phi-4-mini-instruct-generic-cpu:5 | |
| qwen2.5-14b | GPU | chat-completion | 9.30 GB | apache-2.0 | qwen2.5-14b-instruct-generic-gpu:4 |
| CPU | chat-completion | 11.06 GB | apache-2.0 | qwen2.5-14b-instruct-generic-cpu:4 | |
| qwen2.5-coder-14b | GPU | chat-completion | 8.79 GB | apache-2.0 | qwen2.5-coder-14b-instruct-generic-gpu:4 |
| CPU | chat-completion | 11.06 GB | apache-2.0 | qwen2.5-coder-14b-instruct-generic-cpu:4 | |
| qwen2.5-coder-7b | GPU | chat-completion | 4.73 GB | apache-2.0 | qwen2.5-coder-7b-instruct-generic-gpu:4 |
| CPU | chat-completion | 6.16 GB | apache-2.0 | qwen2.5-coder-7b-instruct-generic-cpu:4 | |
| qwen2.5-7b | GPU | chat-completion | 5.20 GB | apache-2.0 | qwen2.5-7b-instruct-generic-gpu:4 |
| CPU | chat-completion | 6.16 GB | apache-2.0 | qwen2.5-7b-instruct-generic-cpu:4 |
======
My system
I'm not using Windows Insider Program
CPU = Intel Core Ultra 7 155H Mem = 32GB NPU = Intel AI Boost, DriverV32.0.100.4404, DriverTime2025/10/26 GPU = Intel Arc Graphics ${\color{red}{System = Windows 11-24H2}}$