Phi-4-reasoning/mini NPU not working
Error generated when loading into the Playground:
2025-05-17 15:28:50.198 [error] Failed loading model Phi-4-reasoning-plus-14.7b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_11067426494884051979_9_0'
025-05-17 15:39:32.909 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-05-17T15:39:32.9087466+01:00 Failed loading model:Phi-4-mini-reasoning-3.8b-qnn error: [Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0', at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, CancellationToken) + 0x380 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__42.MoveNext() + 0x544]
I'm running Lenovo Slim7x Snapdragon 10 Elite device.
AI Toolkit v0.12.2 (15th May) & 0.13.2025051506 (pre-release) have the same issue.
Hi @pkbullock , could you please try the latest version 0.14.2 of AITK to see if the issue is resolved? Thanks
Same error:
I deleted and re-downloaded the model again for the NPU still an issue
Are there system dependencies or assumptions? That I could check?
Seems to be similar to #151 that looks closed with no resolution.
Hi @pkbullock , could you help share the content of C:\Users\<user>\.aitk\models\Microsoft\Phi-4-mini-reasoning-3.8b-qnn\genai_config.json that's causing the EPContext error?
could you try removing 2 lines of "backend_path": "QnnHtp.dll", in genai_config.json and loading the model again?
@vortex-captain - no luck, tried removing that, the provider entirely and downloading the Qualcomm AI SDK and pointing the QnnHtp.dll file to that location instead. It didn't work sadly.
I have also noticed using FoundryLocal also shows the same error. I have updated to the latest AI Toolkit, and doesn't make a difference. My drivers for my NPU were also updated but didn't resolve the issue either.
@pkbullock could you help provide more info as follows? Thanks!
- In task manager, end all tasks named
Inference.Service.Agent, if any - Upgrade AI Toolkit to specific version
0.14.3 - Delete folder
C:\Users\<user>\.aitk\models\Microsoft\Phi-4-mini-reasoning-3.8b-qnn - Open AI Toolkit, re-download
Phi 4 Reasoning 3.8B (NPU Optimized, QNN) - Load
Phi 4 Reasoning 3.8B (NPU Optimized, QNN), and wait forEPContext(1)error to appear - Download and open Process Explorer from https://learn.microsoft.com/en-us/sysinternals/downloads/process-explorer
- Search
Inference.Service.Agentand select the process in Process Explorer - Select "View" -> "Lower Pane View" -> "DLLs" in top menu
- Share the list
- In the list of DLLs, find
QnnHtp.dll,QnnHtpV73Stub.dll,QnnSystem.dll,libcdsprpc.dlland share their paths (please make sure not to leave any personal information in the paths)
QnnHtp.dll - not running QnnHtpV73Stub.dll - Not running, there is a similarly named one, QnnHtpV73StubDrv.dll QnnSystem.dll - C:\Users<user>.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.14.3-win32-arm64\bin libcdsprpc.dll - C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b
Running: AI Toolkit v0.14.3
Thanks @pkbullock ! Could you try the following?
- share the file list of
C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890bandC:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b\HTP - Uninstall manually installed NPU drivers and Qualcomm AI SDK and try loading the model again in AITK. I tested this model on a brand new QNN machine and it works without installing these 2 components manually
Here are the images, the drivers were delivered through a LENOVO system update via Windows Update.
C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b
C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b\HTP
The driver files look right. Could you try replacing 2 lines of
"backend_path": "QnnHtp.dll",
with
"backend_path": "C:/Users/<user>/.vscode/extensions/ms-windows-ai-studio.windows-ai-studio-0.14.3-win32-arm64/bin/QnnHtp.dll",
in genai_config.json and testing again after ending process Inference.Service.Agent and restarting VS Code? To state the obvious, please make sure to replace <user> and use / instead of \ in backend_path.
Besides, please also help share the file list of C:\Users\<user>\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.14.3-win32-arm64\bin
In case this doesn't work, does it yield the same EPContext(1) error?
I’m running into the same issue as others have described. I tried updating both backend_path entries in my genai_config.json to:
"backend_path": "C:/Users/dago/.vscode/extensions/ms-windows-ai-studio.windows-ai-studio-0.14.4-win32-arm64/bin/QnnHtp.dll"
I then stopped Inference.Service.Agent, restarted VS Code, and tried again. Unfortunately, I’m still seeing the same error.
Relevant log output:
2025-06-22 20:24:22.357 [info] CPU: Qualcomm Technologies Inc - Snapdragon(R) X Elite - X1E78100 - Qualcomm(R) Oryon(TM) CPU
2025-06-22 20:24:22.357 [info] Graphics: Qualcomm Incorporated
2025-06-22 20:24:22.357 [info] Supported: QNN,CPU
2025-06-22 20:24:22.466 [info] Command registration.
2025-06-22 20:24:25.266 [info] telemetry event:activate_extension sent
2025-06-22 20:24:40.334 [info] Loading View: modelPlayground
2025-06-22 20:24:40.734 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400] 2025-06-22T20:24:40.7332317+02:00 Loading model:Phi-4-mini-reasoning-3.8b-qnn
2025-06-22 20:24:44.013 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-06-22T20:24:44.0123256+02:00 Failed loading model:Phi-4-mini-reasoning-3.8b-qnn error: [Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0', at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, CancellationToken) + 0x380
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__42.MoveNext() + 0x544]
2025-06-22 20:24:44.014 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401] 2025-06-22T20:24:44.0131031+02:00 Finish loading model:Phi-4-mini-reasoning-3.8b-qnn elapsed time:00:00:03.2798831
2025-06-22 20:24:44.021 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0'
What I’ve also tried:
- Checked if the Qualcomm AI SDK was installed (it was not).
- Attempted to uninstall the NPU driver via Device Manager, but Windows automatically reinstalled it after removal.
Directory contents of C:\Users\dago\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.14.4-win32-arm64\bin:
Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 22.06.2025 19:29 scripts
-a--- 22.06.2025 19:29 79 appsettings-agent.Development.json
-a--- 22.06.2025 19:29 306 appsettings-agent.json
-a--- 22.06.2025 19:29 79 appsettings.Development.json
-a--- 22.06.2025 19:29 401 appsettings.json
-a--- 22.06.2025 19:29 26308192 Inference.Service.Agent.exe
-a--- 22.06.2025 19:29 70 Inference.Service.Agent.staticwebassets.endpoints.json
-a--- 22.06.2025 19:29 8482768 libQnnHtpV68Skel.so
-a--- 22.06.2025 19:29 12142 libqnnhtpv73.cat
-a--- 22.06.2025 19:29 8502100 libQnnHtpV73Skel.so
-a--- 22.06.2025 19:29 1520680 onnxruntime_providers_qnn.dll
-a--- 22.06.2025 19:29 21024 onnxruntime_providers_shared.dll
-a--- 22.06.2025 19:29 1837600 onnxruntime-genai.dll
-a--- 22.06.2025 19:29 13624864 onnxruntime.dll
-a--- 22.06.2025 19:29 3757648 QnnCpu.dll
-a--- 22.06.2025 19:29 1844272 QnnHtp.dll
-a--- 22.06.2025 19:29 54050352 QnnHtpPrepare.dll
-a--- 22.06.2025 19:29 155728 QnnHtpV68Stub.dll
-a--- 22.06.2025 19:29 278624 QnnHtpV73Stub.dll
-a--- 22.06.2025 19:29 549472 QnnSaver.dll
-a--- 22.06.2025 19:29 106040 QnnSystem.dll
-a--- 22.06.2025 19:29 15680064 WorkspaceAutomation.Agent.exe
Question(s):
- Did I miss an additional configuration step for QNN/HTP on Snapdragon X Elite?
- Is there another dependency or DLL that needs to be referenced for the QNNExecutionProvider?
- Has anyone managed to get the Phi-4-mini-reasoning-3.8b-qnn model running on this hardware (e.g. Lenovo Yoga Slim 7x Gen 9 (14" Snapdragon))?
Any suggestions or troubleshooting steps would be greatly appreciated!
Hi @DanielGoehler, could you try using the pre-release version of AITK and test the model again? Thank you.
@timenick With prerelease version 0.15.2025062307, I am still encountering the same error message as before:
2025-06-24 06:44:47.809 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0'
Full log excerpt:
2025-06-24 06:43:33.685 [info] CPU: Qualcomm Technologies Inc - Snapdragon(R) X Elite - X1E78100 - Qualcomm(R) Oryon(TM) CPU
2025-06-24 06:43:33.685 [info] Graphics: Qualcomm Incorporated
2025-06-24 06:43:33.685 [info] Supported: QNN,CPU
2025-06-24 06:43:34.019 [info] Command registration.
2025-06-24 06:43:34.336 [info] Connected to agent:Inference.Service.Agent.WinML pipe after retries:0
2025-06-24 06:43:34.336 [info] Agent startup completed...
2025-06-24 06:43:34.337 [info] Agent unlocked
2025-06-24 06:43:34.348 [info] Information: Microsoft.Hosting.Lifetime [14] 2025-06-24T06:43:34.3440232+02:00 Now listening on: http://localhost:5272
2025-06-24 06:43:34.349 [info] Information: Microsoft.Hosting.Lifetime [0] 2025-06-24T06:43:34.3461234+02:00 Application started. Press Ctrl+C to shut down.
2025-06-24 06:43:34.349 [info] Information: Microsoft.Hosting.Lifetime [0] 2025-06-24T06:43:34.346136+02:00 Hosting environment: Production
2025-06-24 06:43:34.350 [info] Information: Microsoft.Hosting.Lifetime [0] 2025-06-24T06:43:34.3461426+02:00 Content root path: c:\Users\dago\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.15.2025062307-win32-arm64\bin\
2025-06-24 06:43:35.913 [info] Loading View: catalogModels
2025-06-24 06:43:36.678 [info] telemetry event:activate_extension sent
2025-06-24 06:44:29.362 [info] Loading View: modelPlayground
2025-06-24 06:44:45.965 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400] 2025-06-24T06:44:45.9648626+02:00 Loading model:Phi-4-mini-reasoning-3.8b-qnn
2025-06-24 06:44:47.803 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-06-24T06:44:47.8028628+02:00 Failed loading model:Phi-4-mini-reasoning-3.8b-qnn error: [Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0', at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, String, CancellationToken) + 0x6fc
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__44.MoveNext() + 0x54c]
2025-06-24 06:44:47.804 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401] 2025-06-24T06:44:47.803494+02:00 Finish loading model:Phi-4-mini-reasoning-3.8b-qnn elapsed time:00:00:01.8386215
2025-06-24 06:44:47.809 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0'
Additionally, there seems to be an issue with the model catalog in the prerelease version. I don’t see any models listed with NPU support.
Screenshots for comparison:
- Prerelease 0.15.2025062307:
- Release 0.14.4:
I'm getting the same errors with both the GA and Pre Release version on my Lenovo Yoga Slim 7 running latest updates on Windows 11. Was working fine on earlier releases of AI Toolkit.
OS Name Microsoft Windows 11 Home Version 10.0.26120 Build 26120 System SKU LENOVO_MT_83ED_BU_idea_FM_Yoga Slim 7 14Q8X9 Processor Snapdragon® X Elite - X1E78100 - Qualcomm® Oryon™ CPU, 3417 Mhz, 12 Core(s), 12 Logical Processor(s).
@timenick With prerelease version 0.15.2025062307, I am still encountering the same error message as before:
2025-06-24 06:44:47.809 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0'Full log excerpt:
2025-06-24 06:43:33.685 [info] CPU: Qualcomm Technologies Inc - Snapdragon(R) X Elite - X1E78100 - Qualcomm(R) Oryon(TM) CPU 2025-06-24 06:43:33.685 [info] Graphics: Qualcomm Incorporated 2025-06-24 06:43:33.685 [info] Supported: QNN,CPU 2025-06-24 06:43:34.019 [info] Command registration. 2025-06-24 06:43:34.336 [info] Connected to agent:Inference.Service.Agent.WinML pipe after retries:0 2025-06-24 06:43:34.336 [info] Agent startup completed... 2025-06-24 06:43:34.337 [info] Agent unlocked 2025-06-24 06:43:34.348 [info] Information: Microsoft.Hosting.Lifetime [14] 2025-06-24T06:43:34.3440232+02:00 Now listening on: http://localhost:5272 2025-06-24 06:43:34.349 [info] Information: Microsoft.Hosting.Lifetime [0] 2025-06-24T06:43:34.3461234+02:00 Application started. Press Ctrl+C to shut down. 2025-06-24 06:43:34.349 [info] Information: Microsoft.Hosting.Lifetime [0] 2025-06-24T06:43:34.346136+02:00 Hosting environment: Production 2025-06-24 06:43:34.350 [info] Information: Microsoft.Hosting.Lifetime [0] 2025-06-24T06:43:34.3461426+02:00 Content root path: c:\Users\dago\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.15.2025062307-win32-arm64\bin\ 2025-06-24 06:43:35.913 [info] Loading View: catalogModels 2025-06-24 06:43:36.678 [info] telemetry event:activate_extension sent 2025-06-24 06:44:29.362 [info] Loading View: modelPlayground 2025-06-24 06:44:45.965 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400] 2025-06-24T06:44:45.9648626+02:00 Loading model:Phi-4-mini-reasoning-3.8b-qnn 2025-06-24 06:44:47.803 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-06-24T06:44:47.8028628+02:00 Failed loading model:Phi-4-mini-reasoning-3.8b-qnn error: [Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0', at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, String, CancellationToken) + 0x6fc at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__44.MoveNext() + 0x54c] 2025-06-24 06:44:47.804 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401] 2025-06-24T06:44:47.803494+02:00 Finish loading model:Phi-4-mini-reasoning-3.8b-qnn elapsed time:00:00:01.8386215 2025-06-24 06:44:47.809 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0'Additionally, there seems to be an issue with the model catalog in the prerelease version. I don’t see any models listed with NPU support.
Screenshots for comparison:
- Prerelease 0.15.2025062307:
* Release 0.14.4:
![]()
For the model catalog issue, click on "View All" to see all available models.
For QNN EP issue, it appears that all issues occurred on Snapdragon® X Elite - X1E78100, we are investigating on it
@timenick Thanks. View All works.
Completely reinstalled Windows fresh. Same error message.
Failed loading model Phi-4-reasoning-14.7b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_11067426494884051979_9_0' Refer to the Output Panel for more details.
OS Name Microsoft Windows 11 Home Version 10.0.26120 Build 26120 System SKU LENOVO_MT_83ED_BU_idea_FM_Yoga Slim 7 14Q8X9 Processor Snapdragon® X Elite - X1E78100 - Qualcomm® Oryon™ CPU, 3417 Mhz, 12 Core(s), 12 Logical Processor(s).
NPU models worked a few weeks ago but appears to be broken now. My honest observation is these Snap Dragon CoPilot PCs are not very reliable, mostly because of Operating System issues.
Posting "me too" just to show this is affecting more people. First use of AI Toolkit and no NPU models work with same error as previous poster.
lenovo Yoga Slim 7 14Q8X9, windows with latest updates, vs code and AI toolkit updated to latest versions.
Decided to try again ... same error.
I get the following errors in Process Monitor ...
6:14:58.4178748 PM mc-fw-host.exe 5432 QueryInformationVolume C:\Users\nsteb.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.18.0-win32-arm64\bin\QnnSystem.dll BUFFER OVERFLOW VolumeCreationTime: 11/3/2024 4:26:51 PM, VolumeSerialNumber: EA19-5111, SupportsObjects: True, VolumeLabel: Win
6:14:58.4178770 PM mc-fw-host.exe 5432 QueryAllInformationFile C:\Users\nsteb.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.18.0-win32-arm64\bin\QnnSystem.dll BUFFER OVERFLOW CreationTime: 8/2/2025 2:53:57 PM, LastAccessTime: 8/2/2025 6:14:53 PM, LastWriteTime: 8/2/2025 2:53:57 PM, ChangeTime: 8/2/2025 2:57:17 PM, FileAttributes: ANCI, AllocationSize: 4,497,408, EndOfFile: 4,493,400
a couple of hours ago I applied a Lenovo update (something low level), and after the laptop booted back opened VS Code and noticed an update to the AI Toolkit. Applied update. Tried a model, and... it works! NPU is showing activity in the task manager.
Opened Anything LLM - the NPU optimized models work! (previously it would not)
could it be that there was a low level update required from Lenovo?? For reference, this is the update (from history): LENOVO - System Hardware Update - 8/1/2025
Yes. It appears the latest update to AI Toolkit solved the problem. Qualcomm has an excellent Youtube regarding using AnythingLLM that makes it easy to leverage local LLMs that run on the NPU - worth watching. The engineer leverages meta-llama-3.2 running on the NPU to do a local chatbot. Very cool.
Mine too, I had my motherboard replaced and some of my AI capabilities started working again, plus new drivers and windows updates. ALL my AI capabilities are now working. Closing this issue now.
* Release 0.14.4: