vscode-ai-toolkit
vscode-ai-toolkit copied to clipboard
Support for Copilot+ PCs
It would be great to see if AI Toolkit can leverage the NPU in Copilot PCs. Currently this uses the CPU, its nice a quick on the Snapdragon processors but not using the AI processor when running models.
I wonder if this is related to onnxruntime-genai still awaiting QNN support.
This is listed in the docs as supports AI Copilot PC but it doesnt, my NPU activity is 0%. So how to use this?
I don't see any reference yet to CoPilot+ PC in the AI Toolkit docs, at least not here. Because it relies on onnxruntime-genai, I believe QNN support must land there first before AI Toolkit can take full advantage of it. You might be able to take some advantage of the NPU now, indirectly, by using DirectML with a model like Phi-3-mini-4k-directml-int4-awq-block-128-onnx which is optimized for that. I have been using DirectML on my non-CoPilot Qualcomm-based WDK23 to speed up training.
Hi @sirredbeard - I saw it in the release notes on installation of the VSCode extension with the mention of support. But I agree seems many frameworks are dependent on the QNN runtimes/sdks being release.
It seems like direct-ml models don't show up in the model catalog on my PC that has a QC NPU
Me neither - what is the course of action to enable models to show up on Snapdragon machines ?
https://blogs.windows.com/windowsdeveloper/2025/01/29/running-distilled-deepseek-r1-models-locally-on-copilot-pcs-powered-by-windows-copilot-runtime/
apparently, this feature is coming. soon.
https://blogs.windows.com/windowsdeveloper/2025/01/29/running-distilled-deepseek-r1-models-locally-on-copilot-pcs-powered-by-windows-copilot-runtime/
apparently, this feature is coming. soon.
Looks like the last updates stats that feature already released, but I didn't find the model in the catalog.
Which device? I can see and use that npu model with x elite.
Which device? I can see and use that npu model with x elite.
Surface Pro with snapdragon CPU.
Not sure why don't have the local NPU option.
At first I installed the extension to the remote ssh session. After installing it to local window it works.
I don't think the update they have mention isn't out yet. As it is impossible to download many models.
Snapdragon X elite PC with AI Tookit v0.8.6
works lovely using NPU:
@rockcat - what device are you using, I have seen the cool updates and can download the mode but I cannot run it:
I am running Windows Beta, I have tried release and pre-release versions. Deleted and redownload the models. Lenovo Yoga Slim 7x
Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:47.8111417+00:00 LoadModel model:DeepSeek-R1-Distilled-NPU-Optimized
Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:47.8111417+00:00 LoadModel model:DeepSeek-R1-Distilled-NPU-Optimized
Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400] 2025-02-09T09:18:47.811353+00:00 Loading model:DeepSeek-R1-Distilled-NPU-Optimized
Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400] 2025-02-09T09:18:47.811353+00:00 Loading model:DeepSeek-R1-Distilled-NPU-Optimized
Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-02-09T09:18:51.4987663+00:00 Failed loading model:DeepSeek-R1-Distilled-NPU-Optimized error: [Failed to load from EpContext model. qnn_backend_manager.cc:676 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary., at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, CancellationToken) + 0x110
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.<EnsureModelLoadedAsync>d__41.MoveNext() + 0x3bc] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401] 2025-02-09T09:18:51.4993977+00:00 Finish loading model:DeepSeek-R1-Distilled-NPU-Optimized elapsed time:00:00:03.6880328 Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-02-09T09:18:51.4987663+00:00 Failed loading model:DeepSeek-R1-Distilled-NPU-Optimized error: [Failed to load from EpContext model. qnn_backend_manager.cc:676 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary., at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, CancellationToken) + 0x110 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.<EnsureModelLoadedAsync>d__41.MoveNext() + 0x3bc
--- End of stack trace from previous location ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.<EnsureModelLoadedAsync>d__41.MoveNext() + 0x6c8 --- End of stack trace from previous location --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.<LoadModelAsync>d__27.MoveNext() + 0x130
--- End of stack trace from previous location ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
at Microsoft.Neutron.OpenAI.WebApplicationFactory.<>c.<<Create>b__0_6>d.MoveNext() + 0x114]
Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401] 2025-02-09T09:18:51.4993977+00:00 Finish loading model:DeepSeek-R1-Distilled-NPU-Optimized elapsed time:00:00:03.6880328
[2025-02-09T09:18:51.506Z] [ERROR] Failed loading model DeepSeek-R1-Distilled-NPU-Optimized. Failed to load from EpContext model. qnn_backend_manager.cc:676 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary.
Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:55.4010164+00:00 GetLoadedModels
Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceLlamaSharp [0] 2025-02-09T09:18:55.4013778+00:00 GetLoadedModels
Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:55.4010164+00:00 GetLoadedModels
Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceLlamaSharp [0] 2025-02-09T09:18:55.4013778+00:00 GetLoadedModels
Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:55.4030492+00:00 GetModels
Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:55.4030492+00:00 GetModels
Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:55.4060139+00:00 LoadModel model:DeepSeek-R1-Distilled-NPU-Optimized
Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:55.4060139+00:00 LoadModel model:DeepSeek-R1-Distilled-NPU-Optimized
Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400] 2025-02-09T09:18:55.4063+00:00 Loading model:DeepSeek-R1-Distilled-NPU-Optimized
Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400] 2025-02-09T09:18:55.4063+00:00 Loading model:DeepSeek-R1-Distilled-NPU-Optimized
Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-02-09T09:18:58.8581716+00:00 Failed loading model:DeepSeek-R1-Distilled-NPU-Optimized error: [Failed to load from EpContext model. qnn_backend_manager.cc:676 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary., at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, CancellationToken) + 0x110
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.<EnsureModelLoadedAsync>d__41.MoveNext() + 0x3bc] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401] 2025-02-09T09:18:58.8587846+00:00 Finish loading model:DeepSeek-R1-Distilled-NPU-Optimized elapsed time:00:00:03.4524796 Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-02-09T09:18:58.8581716+00:00 Failed loading model:DeepSeek-R1-Distilled-NPU-Optimized error: [Failed to load from EpContext model. qnn_backend_manager.cc:676 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary., at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, CancellationToken) + 0x110 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.<EnsureModelLoadedAsync>d__41.MoveNext() + 0x3bc
--- End of stack trace from previous location ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.<EnsureModelLoadedAsync>d__41.MoveNext() + 0x6c8 --- End of stack trace from previous location --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.<LoadModelAsync>d__27.MoveNext() + 0x130
--- End of stack trace from previous location ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68]
Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401] 2025-02-09T09:18:58.8587846+00:00 Finish loading model:DeepSeek-R1-Distilled-NPU-Optimized elapsed time:00:00:03.4524796
[2025-02-09T09:18:58.866Z] [ERROR] Failed loading model DeepSeek-R1-Distilled-NPU-Optimized. Failed to load from EpContext model. qnn_backend_manager.cc:676 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary.
Lenovo Yoga with Snapdragon X elite
On Sun, Feb 9, 2025 at 9:21 AM Paul Bullock @.***> wrote:
@rockcat https://github.com/rockcat - what device are you using, I have seen the cool updates and can download the mode but I cannot run it:
I am running Windows Beta, I have tried release and pre-release versions. Deleted and redownload the models.
image.png (view on web) https://github.com/user-attachments/assets/2d1c1ac0-7063-45a4-ad94-b056d75914f8
Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:47.8111417+00:00 LoadModel model:DeepSeek-R1-Distilled-NPU-Optimized Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:47.8111417+00:00 LoadModel model:DeepSeek-R1-Distilled-NPU-Optimized Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400] 2025-02-09T09:18:47.811353+00:00 Loading model:DeepSeek-R1-Distilled-NPU-Optimized Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400] 2025-02-09T09:18:47.811353+00:00 Loading model:DeepSeek-R1-Distilled-NPU-Optimized Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-02-09T09:18:51.4987663+00:00 Failed loading model:DeepSeek-R1-Distilled-NPU-Optimized error: [Failed to load from EpContext model. qnn_backend_manager.cc:676 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary., at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, CancellationToken) + 0x110 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.<EnsureModelLoadedAsync>d__41.MoveNext()
- 0x3bc] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401] 2025-02-09T09:18:51.4993977+00:00 Finish loading model:DeepSeek-R1-Distilled-NPU-Optimized elapsed time:00:00:03.6880328 Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-02-09T09:18:51.4987663+00:00 Failed loading model:DeepSeek-R1-Distilled-NPU-Optimized error: [Failed to load from EpContext model. qnn_backend_manager.cc:676 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary., at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, CancellationToken) + 0x110 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.d__41.MoveNext()
- 0x3bc --- End of stack trace from previous location --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.<EnsureModelLoadedAsync>d__41.MoveNext()
- 0x6c8 --- End of stack trace from previous location --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.d__27.MoveNext()
- 0x130 --- End of stack trace from previous location --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68 at Microsoft.Neutron.OpenAI.WebApplicationFactory.<>c.<b__0_6>d.MoveNext()
- 0x114] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401] 2025-02-09T09:18:51.4993977+00:00 Finish loading model:DeepSeek-R1-Distilled-NPU-Optimized elapsed time:00:00:03.6880328 [2025-02-09T09:18:51.506Z] [ERROR] Failed loading model DeepSeek-R1-Distilled-NPU-Optimized. Failed to load from EpContext model. qnn_backend_manager.cc:676 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary. Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:55.4010164+00:00 GetLoadedModels Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceLlamaSharp [0] 2025-02-09T09:18:55.4013778+00:00 GetLoadedModels Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:55.4010164+00:00 GetLoadedModels Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceLlamaSharp [0] 2025-02-09T09:18:55.4013778+00:00 GetLoadedModels Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:55.4030492+00:00 GetModels Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:55.4030492+00:00 GetModels Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:55.4060139+00:00 LoadModel model:DeepSeek-R1-Distilled-NPU-Optimized Debug: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [0] 2025-02-09T09:18:55.4060139+00:00 LoadModel model:DeepSeek-R1-Distilled-NPU-Optimized Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400] 2025-02-09T09:18:55.4063+00:00 Loading model:DeepSeek-R1-Distilled-NPU-Optimized Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400] 2025-02-09T09:18:55.4063+00:00 Loading model:DeepSeek-R1-Distilled-NPU-Optimized Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-02-09T09:18:58.8581716+00:00 Failed loading model:DeepSeek-R1-Distilled-NPU-Optimized error: [Failed to load from EpContext model. qnn_backend_manager.cc:676 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary., at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, CancellationToken) + 0x110 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.<EnsureModelLoadedAsync>d__41.MoveNext()
- 0x3bc] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401] 2025-02-09T09:18:58.8587846+00:00 Finish loading model:DeepSeek-R1-Distilled-NPU-Optimized elapsed time:00:00:03.4524796 Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-02-09T09:18:58.8581716+00:00 Failed loading model:DeepSeek-R1-Distilled-NPU-Optimized error: [Failed to load from EpContext model. qnn_backend_manager.cc:676 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary., at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x58 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, CancellationToken) + 0x110 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.d__41.MoveNext()
- 0x3bc --- End of stack trace from previous location --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.<EnsureModelLoadedAsync>d__41.MoveNext()
- 0x6c8 --- End of stack trace from previous location --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68 at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase1.d__27.MoveNext()
- 0x130 --- End of stack trace from previous location --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401] 2025-02-09T09:18:58.8587846+00:00 Finish loading model:DeepSeek-R1-Distilled-NPU-Optimized elapsed time:00:00:03.4524796 [2025-02-09T09:18:58.866Z] [ERROR] Failed loading model DeepSeek-R1-Distilled-NPU-Optimized. Failed to load from EpContext model. qnn_backend_manager.cc:676 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary.
— Reply to this email directly, view it on GitHub https://github.com/microsoft/vscode-ai-toolkit/issues/92#issuecomment-2646142243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARKP6YTU5DVEF7X3CC6HKD2O4MZ5AVCNFSM6AAAAABNQ4L6GGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBWGE2DEMRUGM . You are receiving this because you were mentioned.Message ID: @.***>