vscode-ai-toolkit Phi-4-reasoning/mini NPU not working

Error generated when loading into the Playground:

2025-05-17 15:28:50.198 [error] Failed loading model Phi-4-reasoning-plus-14.7b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_11067426494884051979_9_0'

025-05-17 15:39:32.909 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-05-17T15:39:32.9087466+01:00 Failed loading model:Phi-4-mini-reasoning-3.8b-qnn error: [Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0', at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, CancellationToken) + 0x380 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__42.MoveNext() + 0x544]

I'm running Lenovo Slim7x Snapdragon 10 Elite device.

AI Toolkit v0.12.2 (15th May) & 0.13.2025051506 (pre-release) have the same issue.

May 17 '25 14:05 pkbullock

Hi @pkbullock , could you please try the latest version 0.14.2 of AITK to see if the issue is resolved? Thanks

May 22 '25 02:05 timenick

Same error:

May 22 '25 17:05 pkbullock

I deleted and re-downloaded the model again for the NPU still an issue

May 22 '25 17:05 pkbullock

Are there system dependencies or assumptions? That I could check?

May 23 '25 09:05 pkbullock

Seems to be similar to #151 that looks closed with no resolution.

May 24 '25 07:05 pkbullock

Hi @pkbullock , could you help share the content of C:\Users\<user>\.aitk\models\Microsoft\Phi-4-mini-reasoning-3.8b-qnn\genai_config.json that's causing the EPContext error?

May 28 '25 07:05 vortex-captain

Hi @vortex-captain - here is the file.

genai_config.json

May 28 '25 08:05 pkbullock

could you try removing 2 lines of "backend_path": "QnnHtp.dll", in genai_config.json and loading the model again?

May 29 '25 04:05 vortex-captain

@vortex-captain - no luck, tried removing that, the provider entirely and downloading the Qualcomm AI SDK and pointing the QnnHtp.dll file to that location instead. It didn't work sadly.

May 29 '25 15:05 pkbullock

I have also noticed using FoundryLocal also shows the same error. I have updated to the latest AI Toolkit, and doesn't make a difference. My drivers for my NPU were also updated but didn't resolve the issue either.

May 29 '25 15:05 pkbullock

@pkbullock could you help provide more info as follows? Thanks!

In task manager, end all tasks named Inference.Service.Agent, if any
Upgrade AI Toolkit to specific version 0.14.3
Delete folder C:\Users\<user>\.aitk\models\Microsoft\Phi-4-mini-reasoning-3.8b-qnn
Open AI Toolkit, re-download Phi 4 Reasoning 3.8B (NPU Optimized, QNN)
Load Phi 4 Reasoning 3.8B (NPU Optimized, QNN), and wait for EPContext(1) error to appear
Download and open Process Explorer from https://learn.microsoft.com/en-us/sysinternals/downloads/process-explorer
Search Inference.Service.Agent and select the process in Process Explorer
Select "View" -> "Lower Pane View" -> "DLLs" in top menu
Share the list
In the list of DLLs, find QnnHtp.dll, QnnHtpV73Stub.dll, QnnSystem.dll, libcdsprpc.dll and share their paths (please make sure not to leave any personal information in the paths)

Jun 03 '25 05:06 vortex-captain

QnnHtp.dll - not running QnnHtpV73Stub.dll - Not running, there is a similarly named one, QnnHtpV73StubDrv.dll QnnSystem.dll - C:\Users<user>.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.14.3-win32-arm64\bin libcdsprpc.dll - C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b

Running: AI Toolkit v0.14.3

Jun 03 '25 06:06 pkbullock

Thanks @pkbullock ! Could you try the following?

share the file list of C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b and C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b\HTP
Uninstall manually installed NPU drivers and Qualcomm AI SDK and try loading the model again in AITK. I tested this model on a brand new QNN machine and it works without installing these 2 components manually

Jun 03 '25 08:06 vortex-captain

Here are the images, the drivers were delivered through a LENOVO system update via Windows Update.

C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b

C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b\HTP

Jun 03 '25 18:06 pkbullock

The driver files look right. Could you try replacing 2 lines of

"backend_path": "QnnHtp.dll",

with

"backend_path": "C:/Users/<user>/.vscode/extensions/ms-windows-ai-studio.windows-ai-studio-0.14.3-win32-arm64/bin/QnnHtp.dll",

in genai_config.json and testing again after ending process Inference.Service.Agent and restarting VS Code? To state the obvious, please make sure to replace <user> and use / instead of \ in backend_path.

Besides, please also help share the file list of C:\Users\<user>\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.14.3-win32-arm64\bin

In case this doesn't work, does it yield the same EPContext(1) error?

Jun 04 '25 01:06 vortex-captain

I’m running into the same issue as others have described. I tried updating both backend_path entries in my genai_config.json to:

"backend_path": "C:/Users/dago/.vscode/extensions/ms-windows-ai-studio.windows-ai-studio-0.14.4-win32-arm64/bin/QnnHtp.dll"

I then stopped Inference.Service.Agent, restarted VS Code, and tried again. Unfortunately, I’m still seeing the same error.

Relevant log output:

2025-06-22 20:24:22.357 [info] CPU: Qualcomm Technologies Inc - Snapdragon(R) X Elite - X1E78100 - Qualcomm(R) Oryon(TM) CPU
2025-06-22 20:24:22.357 [info] Graphics: Qualcomm Incorporated
2025-06-22 20:24:22.357 [info] Supported: QNN,CPU
2025-06-22 20:24:22.466 [info] Command registration.
2025-06-22 20:24:25.266 [info] telemetry event:activate_extension sent
2025-06-22 20:24:40.334 [info] Loading View: modelPlayground
2025-06-22 20:24:40.734 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400]  2025-06-22T20:24:40.7332317+02:00 Loading model:Phi-4-mini-reasoning-3.8b-qnn
2025-06-22 20:24:44.013 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402]  2025-06-22T20:24:44.0123256+02:00 Failed loading model:Phi-4-mini-reasoning-3.8b-qnn error: [Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0',    at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58
   at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, CancellationToken) + 0x380
   at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__42.MoveNext() + 0x544]
2025-06-22 20:24:44.014 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401]  2025-06-22T20:24:44.0131031+02:00 Finish loading model:Phi-4-mini-reasoning-3.8b-qnn elapsed time:00:00:03.2798831
2025-06-22 20:24:44.021 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0'

What I’ve also tried:

Checked if the Qualcomm AI SDK was installed (it was not).
Attempted to uninstall the NPU driver via Device Manager, but Windows automatically reinstalled it after removal.

Directory contents of C:\Users\dago\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.14.4-win32-arm64\bin:

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----          22.06.2025    19:29                scripts
-a---          22.06.2025    19:29             79 appsettings-agent.Development.json
-a---          22.06.2025    19:29            306 appsettings-agent.json
-a---          22.06.2025    19:29             79 appsettings.Development.json
-a---          22.06.2025    19:29            401 appsettings.json
-a---          22.06.2025    19:29       26308192 Inference.Service.Agent.exe
-a---          22.06.2025    19:29             70 Inference.Service.Agent.staticwebassets.endpoints.json
-a---          22.06.2025    19:29        8482768 libQnnHtpV68Skel.so
-a---          22.06.2025    19:29          12142 libqnnhtpv73.cat
-a---          22.06.2025    19:29        8502100 libQnnHtpV73Skel.so
-a---          22.06.2025    19:29        1520680 onnxruntime_providers_qnn.dll
-a---          22.06.2025    19:29          21024 onnxruntime_providers_shared.dll
-a---          22.06.2025    19:29        1837600 onnxruntime-genai.dll
-a---          22.06.2025    19:29       13624864 onnxruntime.dll
-a---          22.06.2025    19:29        3757648 QnnCpu.dll
-a---          22.06.2025    19:29        1844272 QnnHtp.dll
-a---          22.06.2025    19:29       54050352 QnnHtpPrepare.dll
-a---          22.06.2025    19:29         155728 QnnHtpV68Stub.dll
-a---          22.06.2025    19:29         278624 QnnHtpV73Stub.dll
-a---          22.06.2025    19:29         549472 QnnSaver.dll
-a---          22.06.2025    19:29         106040 QnnSystem.dll
-a---          22.06.2025    19:29       15680064 WorkspaceAutomation.Agent.exe

Question(s):

Did I miss an additional configuration step for QNN/HTP on Snapdragon X Elite?
Is there another dependency or DLL that needs to be referenced for the QNNExecutionProvider?
Has anyone managed to get the Phi-4-mini-reasoning-3.8b-qnn model running on this hardware (e.g. Lenovo Yoga Slim 7x Gen 9 (14" Snapdragon))?

Any suggestions or troubleshooting steps would be greatly appreciated!

Jun 22 '25 19:06 DanielGoehler

Hi @DanielGoehler, could you try using the pre-release version of AITK and test the model again? Thank you.

Jun 24 '25 03:06 timenick

@timenick With prerelease version 0.15.2025062307, I am still encountering the same error message as before:

2025-06-24 06:44:47.809 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0'

Full log excerpt:

2025-06-24 06:43:33.685 [info] CPU: Qualcomm Technologies Inc - Snapdragon(R) X Elite - X1E78100 - Qualcomm(R) Oryon(TM) CPU
2025-06-24 06:43:33.685 [info] Graphics: Qualcomm Incorporated
2025-06-24 06:43:33.685 [info] Supported: QNN,CPU
2025-06-24 06:43:34.019 [info] Command registration.
2025-06-24 06:43:34.336 [info] Connected to agent:Inference.Service.Agent.WinML pipe after retries:0
2025-06-24 06:43:34.336 [info] Agent startup completed...
2025-06-24 06:43:34.337 [info] Agent unlocked
2025-06-24 06:43:34.348 [info] Information: Microsoft.Hosting.Lifetime [14]  2025-06-24T06:43:34.3440232+02:00 Now listening on: http://localhost:5272
2025-06-24 06:43:34.349 [info] Information: Microsoft.Hosting.Lifetime [0]  2025-06-24T06:43:34.3461234+02:00 Application started. Press Ctrl+C to shut down.
2025-06-24 06:43:34.349 [info] Information: Microsoft.Hosting.Lifetime [0]  2025-06-24T06:43:34.346136+02:00 Hosting environment: Production
2025-06-24 06:43:34.350 [info] Information: Microsoft.Hosting.Lifetime [0]  2025-06-24T06:43:34.3461426+02:00 Content root path: c:\Users\dago\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.15.2025062307-win32-arm64\bin\
2025-06-24 06:43:35.913 [info] Loading View: catalogModels
2025-06-24 06:43:36.678 [info] telemetry event:activate_extension sent
2025-06-24 06:44:29.362 [info] Loading View: modelPlayground
2025-06-24 06:44:45.965 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400]  2025-06-24T06:44:45.9648626+02:00 Loading model:Phi-4-mini-reasoning-3.8b-qnn
2025-06-24 06:44:47.803 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402]  2025-06-24T06:44:47.8028628+02:00 Failed loading model:Phi-4-mini-reasoning-3.8b-qnn error: [Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0',    at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58
   at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, String, CancellationToken) + 0x6fc
   at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__44.MoveNext() + 0x54c]
2025-06-24 06:44:47.804 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401]  2025-06-24T06:44:47.803494+02:00 Finish loading model:Phi-4-mini-reasoning-3.8b-qnn elapsed time:00:00:01.8386215
2025-06-24 06:44:47.809 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0'

Additionally, there seems to be an issue with the model catalog in the prerelease version. I don’t see any models listed with NPU support.

Screenshots for comparison:

Prerelease 0.15.2025062307:

Release 0.14.4:

Jun 24 '25 04:06 DanielGoehler

I'm getting the same errors with both the GA and Pre Release version on my Lenovo Yoga Slim 7 running latest updates on Windows 11. Was working fine on earlier releases of AI Toolkit.

OS Name Microsoft Windows 11 Home Version 10.0.26120 Build 26120 System SKU LENOVO_MT_83ED_BU_idea_FM_Yoga Slim 7 14Q8X9 Processor Snapdragon® X Elite - X1E78100 - Qualcomm® Oryon™ CPU, 3417 Mhz, 12 Core(s), 12 Logical Processor(s).

Jun 25 '25 02:06 nsteblay

@timenick With prerelease version 0.15.2025062307, I am still encountering the same error message as before:

2025-06-24 06:44:47.809 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0'

Full log excerpt:

2025-06-24 06:43:33.685 [info] CPU: Qualcomm Technologies Inc - Snapdragon(R) X Elite - X1E78100 - Qualcomm(R) Oryon(TM) CPU
2025-06-24 06:43:33.685 [info] Graphics: Qualcomm Incorporated
2025-06-24 06:43:33.685 [info] Supported: QNN,CPU
2025-06-24 06:43:34.019 [info] Command registration.
2025-06-24 06:43:34.336 [info] Connected to agent:Inference.Service.Agent.WinML pipe after retries:0
2025-06-24 06:43:34.336 [info] Agent startup completed...
2025-06-24 06:43:34.337 [info] Agent unlocked
2025-06-24 06:43:34.348 [info] Information: Microsoft.Hosting.Lifetime [14]  2025-06-24T06:43:34.3440232+02:00 Now listening on: http://localhost:5272
2025-06-24 06:43:34.349 [info] Information: Microsoft.Hosting.Lifetime [0]  2025-06-24T06:43:34.3461234+02:00 Application started. Press Ctrl+C to shut down.
2025-06-24 06:43:34.349 [info] Information: Microsoft.Hosting.Lifetime [0]  2025-06-24T06:43:34.346136+02:00 Hosting environment: Production
2025-06-24 06:43:34.350 [info] Information: Microsoft.Hosting.Lifetime [0]  2025-06-24T06:43:34.3461426+02:00 Content root path: c:\Users\dago\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.15.2025062307-win32-arm64\bin\
2025-06-24 06:43:35.913 [info] Loading View: catalogModels
2025-06-24 06:43:36.678 [info] telemetry event:activate_extension sent
2025-06-24 06:44:29.362 [info] Loading View: modelPlayground
2025-06-24 06:44:45.965 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400]  2025-06-24T06:44:45.9648626+02:00 Loading model:Phi-4-mini-reasoning-3.8b-qnn
2025-06-24 06:44:47.803 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402]  2025-06-24T06:44:47.8028628+02:00 Failed loading model:Phi-4-mini-reasoning-3.8b-qnn error: [Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0',    at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58
   at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, String, CancellationToken) + 0x6fc
   at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__44.MoveNext() + 0x54c]
2025-06-24 06:44:47.804 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401]  2025-06-24T06:44:47.803494+02:00 Finish loading model:Phi-4-mini-reasoning-3.8b-qnn elapsed time:00:00:01.8386215
2025-06-24 06:44:47.809 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0'

Additionally, there seems to be an issue with the model catalog in the prerelease version. I don’t see any models listed with NPU support.

Screenshots for comparison:

Prerelease 0.15.2025062307:

* Release 0.14.4:

For the model catalog issue, click on "View All" to see all available models.

For QNN EP issue, it appears that all issues occurred on Snapdragon® X Elite - X1E78100, we are investigating on it

Jun 25 '25 03:06 timenick

@timenick Thanks. View All works.

Jun 25 '25 05:06 DanielGoehler

Completely reinstalled Windows fresh. Same error message.

Failed loading model Phi-4-reasoning-14.7b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_11067426494884051979_9_0' Refer to the Output Panel for more details.

OS Name Microsoft Windows 11 Home Version 10.0.26120 Build 26120 System SKU LENOVO_MT_83ED_BU_idea_FM_Yoga Slim 7 14Q8X9 Processor Snapdragon® X Elite - X1E78100 - Qualcomm® Oryon™ CPU, 3417 Mhz, 12 Core(s), 12 Logical Processor(s).

NPU models worked a few weeks ago but appears to be broken now. My honest observation is these Snap Dragon CoPilot PCs are not very reliable, mostly because of Operating System issues.

Jul 09 '25 03:07 nsteblay

Posting "me too" just to show this is affecting more people. First use of AI Toolkit and no NPU models work with same error as previous poster.

lenovo Yoga Slim 7 14Q8X9, windows with latest updates, vs code and AI toolkit updated to latest versions.

Jul 09 '25 04:07 haiduc32

Decided to try again ... same error.

I get the following errors in Process Monitor ...

6:14:58.4178748 PM mc-fw-host.exe 5432 QueryInformationVolume C:\Users\nsteb.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.18.0-win32-arm64\bin\QnnSystem.dll BUFFER OVERFLOW VolumeCreationTime: 11/3/2024 4:26:51 PM, VolumeSerialNumber: EA19-5111, SupportsObjects: True, VolumeLabel: Win

6:14:58.4178770 PM mc-fw-host.exe 5432 QueryAllInformationFile C:\Users\nsteb.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.18.0-win32-arm64\bin\QnnSystem.dll BUFFER OVERFLOW CreationTime: 8/2/2025 2:53:57 PM, LastAccessTime: 8/2/2025 6:14:53 PM, LastWriteTime: 8/2/2025 2:53:57 PM, ChangeTime: 8/2/2025 2:57:17 PM, FileAttributes: ANCI, AllocationSize: 4,497,408, EndOfFile: 4,493,400

Aug 02 '25 23:08 nsteblay

a couple of hours ago I applied a Lenovo update (something low level), and after the laptop booted back opened VS Code and noticed an update to the AI Toolkit. Applied update. Tried a model, and... it works! NPU is showing activity in the task manager.

Opened Anything LLM - the NPU optimized models work! (previously it would not)

could it be that there was a low level update required from Lenovo?? For reference, this is the update (from history): LENOVO - System Hardware Update - 8/1/2025

Aug 06 '25 18:08 haiduc32

Yes. It appears the latest update to AI Toolkit solved the problem. Qualcomm has an excellent Youtube regarding using AnythingLLM that makes it easy to leverage local LLMs that run on the NPU - worth watching. The engineer leverages meta-llama-3.2 running on the NPU to do a local chatbot. Very cool.

Aug 06 '25 20:08 nsteblay

Mine too, I had my motherboard replaced and some of my AI capabilities started working again, plus new drivers and windows updates. ALL my AI capabilities are now working. Closing this issue now.

Aug 07 '25 05:08 pkbullock