vscode-ai-toolkit icon indicating copy to clipboard operation
vscode-ai-toolkit copied to clipboard

Phi-4-reasoning/mini NPU not working

Open pkbullock opened this issue 7 months ago • 10 comments

Error generated when loading into the Playground:

2025-05-17 15:28:50.198 [error] Failed loading model Phi-4-reasoning-plus-14.7b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_11067426494884051979_9_0'

025-05-17 15:39:32.909 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-05-17T15:39:32.9087466+01:00 Failed loading model:Phi-4-mini-reasoning-3.8b-qnn error: [Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0', at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, CancellationToken) + 0x380 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__42.MoveNext() + 0x544]

I'm running Lenovo Slim7x Snapdragon 10 Elite device.

AI Toolkit v0.12.2 (15th May) & 0.13.2025051506 (pre-release) have the same issue.

Image

pkbullock avatar May 17 '25 14:05 pkbullock

Hi @pkbullock , could you please try the latest version 0.14.2 of AITK to see if the issue is resolved? Thanks

timenick avatar May 22 '25 02:05 timenick

Same error:

Image

Image

Image

pkbullock avatar May 22 '25 17:05 pkbullock

I deleted and re-downloaded the model again for the NPU still an issue

pkbullock avatar May 22 '25 17:05 pkbullock

Are there system dependencies or assumptions? That I could check?

pkbullock avatar May 23 '25 09:05 pkbullock

Seems to be similar to #151 that looks closed with no resolution.

pkbullock avatar May 24 '25 07:05 pkbullock

Hi @pkbullock , could you help share the content of C:\Users\<user>\.aitk\models\Microsoft\Phi-4-mini-reasoning-3.8b-qnn\genai_config.json that's causing the EPContext error?

vortex-captain avatar May 28 '25 07:05 vortex-captain

Hi @vortex-captain - here is the file.

genai_config.json

pkbullock avatar May 28 '25 08:05 pkbullock

could you try removing 2 lines of "backend_path": "QnnHtp.dll", in genai_config.json and loading the model again?

vortex-captain avatar May 29 '25 04:05 vortex-captain

@vortex-captain - no luck, tried removing that, the provider entirely and downloading the Qualcomm AI SDK and pointing the QnnHtp.dll file to that location instead. It didn't work sadly.

pkbullock avatar May 29 '25 15:05 pkbullock

I have also noticed using FoundryLocal also shows the same error. I have updated to the latest AI Toolkit, and doesn't make a difference. My drivers for my NPU were also updated but didn't resolve the issue either.

pkbullock avatar May 29 '25 15:05 pkbullock

@pkbullock could you help provide more info as follows? Thanks!

  • In task manager, end all tasks named Inference.Service.Agent, if any
  • Upgrade AI Toolkit to specific version 0.14.3
  • Delete folder C:\Users\<user>\.aitk\models\Microsoft\Phi-4-mini-reasoning-3.8b-qnn
  • Open AI Toolkit, re-download Phi 4 Reasoning 3.8B (NPU Optimized, QNN)
  • Load Phi 4 Reasoning 3.8B (NPU Optimized, QNN), and wait for EPContext(1) error to appear
  • Download and open Process Explorer from https://learn.microsoft.com/en-us/sysinternals/downloads/process-explorer
  • Search Inference.Service.Agent and select the process in Process Explorer
  • Select "View" -> "Lower Pane View" -> "DLLs" in top menu
  • Share the list
  • In the list of DLLs, find QnnHtp.dll, QnnHtpV73Stub.dll, QnnSystem.dll, libcdsprpc.dll and share their paths (please make sure not to leave any personal information in the paths)

vortex-captain avatar Jun 03 '25 05:06 vortex-captain

Image

QnnHtp.dll - not running QnnHtpV73Stub.dll - Not running, there is a similarly named one, QnnHtpV73StubDrv.dll QnnSystem.dll - C:\Users<user>.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.14.3-win32-arm64\bin libcdsprpc.dll - C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b

Image

Running: AI Toolkit v0.14.3

pkbullock avatar Jun 03 '25 06:06 pkbullock

Thanks @pkbullock ! Could you try the following?

  • share the file list of C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b and C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b\HTP
  • Uninstall manually installed NPU drivers and Qualcomm AI SDK and try loading the model again in AITK. I tested this model on a brand new QNN machine and it works without installing these 2 components manually

vortex-captain avatar Jun 03 '25 08:06 vortex-captain

Here are the images, the drivers were delivered through a LENOVO system update via Windows Update.

C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b Image

C:\Windows\System32\DriverStore\FileRepository\qcnspmcdm8380.inf_arm64_709a025a458a890b\HTP Image

pkbullock avatar Jun 03 '25 18:06 pkbullock

The driver files look right. Could you try replacing 2 lines of

"backend_path": "QnnHtp.dll",

with

"backend_path": "C:/Users/<user>/.vscode/extensions/ms-windows-ai-studio.windows-ai-studio-0.14.3-win32-arm64/bin/QnnHtp.dll",

in genai_config.json and testing again after ending process Inference.Service.Agent and restarting VS Code? To state the obvious, please make sure to replace <user> and use / instead of \ in backend_path.

Besides, please also help share the file list of C:\Users\<user>\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.14.3-win32-arm64\bin

In case this doesn't work, does it yield the same EPContext(1) error?

vortex-captain avatar Jun 04 '25 01:06 vortex-captain

I’m running into the same issue as others have described. I tried updating both backend_path entries in my genai_config.json to:

"backend_path": "C:/Users/dago/.vscode/extensions/ms-windows-ai-studio.windows-ai-studio-0.14.4-win32-arm64/bin/QnnHtp.dll"

I then stopped Inference.Service.Agent, restarted VS Code, and tried again. Unfortunately, I’m still seeing the same error.

Relevant log output:

2025-06-22 20:24:22.357 [info] CPU: Qualcomm Technologies Inc - Snapdragon(R) X Elite - X1E78100 - Qualcomm(R) Oryon(TM) CPU
2025-06-22 20:24:22.357 [info] Graphics: Qualcomm Incorporated
2025-06-22 20:24:22.357 [info] Supported: QNN,CPU
2025-06-22 20:24:22.466 [info] Command registration.
2025-06-22 20:24:25.266 [info] telemetry event:activate_extension sent
2025-06-22 20:24:40.334 [info] Loading View: modelPlayground
2025-06-22 20:24:40.734 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400]  2025-06-22T20:24:40.7332317+02:00 Loading model:Phi-4-mini-reasoning-3.8b-qnn
2025-06-22 20:24:44.013 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402]  2025-06-22T20:24:44.0123256+02:00 Failed loading model:Phi-4-mini-reasoning-3.8b-qnn error: [Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0',    at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58
   at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, CancellationToken) + 0x380
   at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__42.MoveNext() + 0x544]
2025-06-22 20:24:44.014 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401]  2025-06-22T20:24:44.0131031+02:00 Finish loading model:Phi-4-mini-reasoning-3.8b-qnn elapsed time:00:00:03.2798831
2025-06-22 20:24:44.021 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0' 

What I’ve also tried:

  • Checked if the Qualcomm AI SDK was installed (it was not).
  • Attempted to uninstall the NPU driver via Device Manager, but Windows automatically reinstalled it after removal. Image

Directory contents of C:\Users\dago\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.14.4-win32-arm64\bin:

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----          22.06.2025    19:29                scripts
-a---          22.06.2025    19:29             79 appsettings-agent.Development.json
-a---          22.06.2025    19:29            306 appsettings-agent.json
-a---          22.06.2025    19:29             79 appsettings.Development.json
-a---          22.06.2025    19:29            401 appsettings.json
-a---          22.06.2025    19:29       26308192 Inference.Service.Agent.exe
-a---          22.06.2025    19:29             70 Inference.Service.Agent.staticwebassets.endpoints.json
-a---          22.06.2025    19:29        8482768 libQnnHtpV68Skel.so
-a---          22.06.2025    19:29          12142 libqnnhtpv73.cat
-a---          22.06.2025    19:29        8502100 libQnnHtpV73Skel.so
-a---          22.06.2025    19:29        1520680 onnxruntime_providers_qnn.dll
-a---          22.06.2025    19:29          21024 onnxruntime_providers_shared.dll
-a---          22.06.2025    19:29        1837600 onnxruntime-genai.dll
-a---          22.06.2025    19:29       13624864 onnxruntime.dll
-a---          22.06.2025    19:29        3757648 QnnCpu.dll
-a---          22.06.2025    19:29        1844272 QnnHtp.dll
-a---          22.06.2025    19:29       54050352 QnnHtpPrepare.dll
-a---          22.06.2025    19:29         155728 QnnHtpV68Stub.dll
-a---          22.06.2025    19:29         278624 QnnHtpV73Stub.dll
-a---          22.06.2025    19:29         549472 QnnSaver.dll
-a---          22.06.2025    19:29         106040 QnnSystem.dll
-a---          22.06.2025    19:29       15680064 WorkspaceAutomation.Agent.exe

Question(s):

  • Did I miss an additional configuration step for QNN/HTP on Snapdragon X Elite?
  • Is there another dependency or DLL that needs to be referenced for the QNNExecutionProvider?
  • Has anyone managed to get the Phi-4-mini-reasoning-3.8b-qnn model running on this hardware (e.g. Lenovo Yoga Slim 7x Gen 9 (14" Snapdragon))?

Any suggestions or troubleshooting steps would be greatly appreciated!

DanielGoehler avatar Jun 22 '25 19:06 DanielGoehler

Hi @DanielGoehler, could you try using the pre-release version of AITK and test the model again? Thank you.

Image

timenick avatar Jun 24 '25 03:06 timenick

@timenick With prerelease version 0.15.2025062307, I am still encountering the same error message as before:

2025-06-24 06:44:47.809 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0'

Full log excerpt:

2025-06-24 06:43:33.685 [info] CPU: Qualcomm Technologies Inc - Snapdragon(R) X Elite - X1E78100 - Qualcomm(R) Oryon(TM) CPU
2025-06-24 06:43:33.685 [info] Graphics: Qualcomm Incorporated
2025-06-24 06:43:33.685 [info] Supported: QNN,CPU
2025-06-24 06:43:34.019 [info] Command registration.
2025-06-24 06:43:34.336 [info] Connected to agent:Inference.Service.Agent.WinML pipe after retries:0
2025-06-24 06:43:34.336 [info] Agent startup completed...
2025-06-24 06:43:34.337 [info] Agent unlocked
2025-06-24 06:43:34.348 [info] Information: Microsoft.Hosting.Lifetime [14]  2025-06-24T06:43:34.3440232+02:00 Now listening on: http://localhost:5272
2025-06-24 06:43:34.349 [info] Information: Microsoft.Hosting.Lifetime [0]  2025-06-24T06:43:34.3461234+02:00 Application started. Press Ctrl+C to shut down.
2025-06-24 06:43:34.349 [info] Information: Microsoft.Hosting.Lifetime [0]  2025-06-24T06:43:34.346136+02:00 Hosting environment: Production
2025-06-24 06:43:34.350 [info] Information: Microsoft.Hosting.Lifetime [0]  2025-06-24T06:43:34.3461426+02:00 Content root path: c:\Users\dago\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.15.2025062307-win32-arm64\bin\
2025-06-24 06:43:35.913 [info] Loading View: catalogModels
2025-06-24 06:43:36.678 [info] telemetry event:activate_extension sent
2025-06-24 06:44:29.362 [info] Loading View: modelPlayground
2025-06-24 06:44:45.965 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400]  2025-06-24T06:44:45.9648626+02:00 Loading model:Phi-4-mini-reasoning-3.8b-qnn
2025-06-24 06:44:47.803 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402]  2025-06-24T06:44:47.8028628+02:00 Failed loading model:Phi-4-mini-reasoning-3.8b-qnn error: [Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0',    at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58
   at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, String, CancellationToken) + 0x6fc
   at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__44.MoveNext() + 0x54c]
2025-06-24 06:44:47.804 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401]  2025-06-24T06:44:47.803494+02:00 Finish loading model:Phi-4-mini-reasoning-3.8b-qnn elapsed time:00:00:01.8386215
2025-06-24 06:44:47.809 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0' 

Additionally, there seems to be an issue with the model catalog in the prerelease version. I don’t see any models listed with NPU support.

Screenshots for comparison:

  • Prerelease 0.15.2025062307:
Image
  • Release 0.14.4:
Image

DanielGoehler avatar Jun 24 '25 04:06 DanielGoehler

I'm getting the same errors with both the GA and Pre Release version on my Lenovo Yoga Slim 7 running latest updates on Windows 11. Was working fine on earlier releases of AI Toolkit.

OS Name Microsoft Windows 11 Home Version 10.0.26120 Build 26120 System SKU LENOVO_MT_83ED_BU_idea_FM_Yoga Slim 7 14Q8X9 Processor Snapdragon® X Elite - X1E78100 - Qualcomm® Oryon™ CPU, 3417 Mhz, 12 Core(s), 12 Logical Processor(s).

nsteblay avatar Jun 25 '25 02:06 nsteblay

@timenick With prerelease version 0.15.2025062307, I am still encountering the same error message as before:

2025-06-24 06:44:47.809 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0'

Full log excerpt:

2025-06-24 06:43:33.685 [info] CPU: Qualcomm Technologies Inc - Snapdragon(R) X Elite - X1E78100 - Qualcomm(R) Oryon(TM) CPU
2025-06-24 06:43:33.685 [info] Graphics: Qualcomm Incorporated
2025-06-24 06:43:33.685 [info] Supported: QNN,CPU
2025-06-24 06:43:34.019 [info] Command registration.
2025-06-24 06:43:34.336 [info] Connected to agent:Inference.Service.Agent.WinML pipe after retries:0
2025-06-24 06:43:34.336 [info] Agent startup completed...
2025-06-24 06:43:34.337 [info] Agent unlocked
2025-06-24 06:43:34.348 [info] Information: Microsoft.Hosting.Lifetime [14]  2025-06-24T06:43:34.3440232+02:00 Now listening on: http://localhost:5272
2025-06-24 06:43:34.349 [info] Information: Microsoft.Hosting.Lifetime [0]  2025-06-24T06:43:34.3461234+02:00 Application started. Press Ctrl+C to shut down.
2025-06-24 06:43:34.349 [info] Information: Microsoft.Hosting.Lifetime [0]  2025-06-24T06:43:34.346136+02:00 Hosting environment: Production
2025-06-24 06:43:34.350 [info] Information: Microsoft.Hosting.Lifetime [0]  2025-06-24T06:43:34.3461426+02:00 Content root path: c:\Users\dago\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.15.2025062307-win32-arm64\bin\
2025-06-24 06:43:35.913 [info] Loading View: catalogModels
2025-06-24 06:43:36.678 [info] telemetry event:activate_extension sent
2025-06-24 06:44:29.362 [info] Loading View: modelPlayground
2025-06-24 06:44:45.965 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400]  2025-06-24T06:44:45.9648626+02:00 Loading model:Phi-4-mini-reasoning-3.8b-qnn
2025-06-24 06:44:47.803 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402]  2025-06-24T06:44:47.8028628+02:00 Failed loading model:Phi-4-mini-reasoning-3.8b-qnn error: [Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0',    at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58
   at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, String, CancellationToken) + 0x6fc
   at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__44.MoveNext() + 0x54c]
2025-06-24 06:44:47.804 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401]  2025-06-24T06:44:47.803494+02:00 Finish loading model:Phi-4-mini-reasoning-3.8b-qnn elapsed time:00:00:01.8386215
2025-06-24 06:44:47.809 [error] Failed loading model Phi-4-mini-reasoning-3.8b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_16507318232818193198_9_0' 

Additionally, there seems to be an issue with the model catalog in the prerelease version. I don’t see any models listed with NPU support.

Screenshots for comparison:

  • Prerelease 0.15.2025062307:
Image * Release 0.14.4: Image

For the model catalog issue, click on "View All" to see all available models.

Image

For QNN EP issue, it appears that all issues occurred on Snapdragon® X Elite - X1E78100, we are investigating on it

timenick avatar Jun 25 '25 03:06 timenick

@timenick Thanks. View All works.

DanielGoehler avatar Jun 25 '25 05:06 DanielGoehler

Completely reinstalled Windows fresh. Same error message.

Failed loading model Phi-4-reasoning-14.7b-qnn. Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_11067426494884051979_9_0' Refer to the Output Panel for more details.

OS Name Microsoft Windows 11 Home Version 10.0.26120 Build 26120 System SKU LENOVO_MT_83ED_BU_idea_FM_Yoga Slim 7 14Q8X9 Processor Snapdragon® X Elite - X1E78100 - Qualcomm® Oryon™ CPU, 3417 Mhz, 12 Core(s), 12 Logical Processor(s).

NPU models worked a few weeks ago but appears to be broken now. My honest observation is these Snap Dragon CoPilot PCs are not very reliable, mostly because of Operating System issues.

nsteblay avatar Jul 09 '25 03:07 nsteblay

Posting "me too" just to show this is affecting more people. First use of AI Toolkit and no NPU models work with same error as previous poster.

lenovo Yoga Slim 7 14Q8X9, windows with latest updates, vs code and AI toolkit updated to latest versions.

haiduc32 avatar Jul 09 '25 04:07 haiduc32

Decided to try again ... same error.

I get the following errors in Process Monitor ...

6:14:58.4178748 PM mc-fw-host.exe 5432 QueryInformationVolume C:\Users\nsteb.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.18.0-win32-arm64\bin\QnnSystem.dll BUFFER OVERFLOW VolumeCreationTime: 11/3/2024 4:26:51 PM, VolumeSerialNumber: EA19-5111, SupportsObjects: True, VolumeLabel: Win

6:14:58.4178770 PM mc-fw-host.exe 5432 QueryAllInformationFile C:\Users\nsteb.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-0.18.0-win32-arm64\bin\QnnSystem.dll BUFFER OVERFLOW CreationTime: 8/2/2025 2:53:57 PM, LastAccessTime: 8/2/2025 6:14:53 PM, LastWriteTime: 8/2/2025 2:53:57 PM, ChangeTime: 8/2/2025 2:57:17 PM, FileAttributes: ANCI, AllocationSize: 4,497,408, EndOfFile: 4,493,400

nsteblay avatar Aug 02 '25 23:08 nsteblay

a couple of hours ago I applied a Lenovo update (something low level), and after the laptop booted back opened VS Code and noticed an update to the AI Toolkit. Applied update. Tried a model, and... it works! NPU is showing activity in the task manager.

Opened Anything LLM - the NPU optimized models work! (previously it would not)

could it be that there was a low level update required from Lenovo?? For reference, this is the update (from history): LENOVO - System Hardware Update - 8/1/2025

haiduc32 avatar Aug 06 '25 18:08 haiduc32

Yes. It appears the latest update to AI Toolkit solved the problem. Qualcomm has an excellent Youtube regarding using AnythingLLM that makes it easy to leverage local LLMs that run on the NPU - worth watching. The engineer leverages meta-llama-3.2 running on the NPU to do a local chatbot. Very cool.

nsteblay avatar Aug 06 '25 20:08 nsteblay

Mine too, I had my motherboard replaced and some of my AI capabilities started working again, plus new drivers and windows updates. ALL my AI capabilities are now working. Closing this issue now.

pkbullock avatar Aug 07 '25 05:08 pkbullock