NPU models no longer listed in 0.7.117
I just upgraded to v0.7.117 on Surface Pro 11
All the NPU models have disappeared, foundry model list shows only CPU models
PS C:\Users\filip> foundry model list
Alias Device Task File Size License Model ID
-----------------------------------------------------------------------------------------------
phi-4 CPU chat-completion 10.16 GB MIT Phi-4-generic-cpu:1
----------------------------------------------------------------------------------------------------------
phi-3.5-mini CPU chat-completion 2.53 GB MIT Phi-3.5-mini-instruct-generic-cpu:1
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k CPU chat-completion 2.54 GB MIT Phi-3-mini-128k-instruct-generic-cpu:2
-----------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k CPU chat-completion 2.53 GB MIT Phi-3-mini-4k-instruct-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2 CPU chat-completion 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b CPU chat-completion 11.51 GB MIT deepseek-r1-distill-qwen-14b-generic-cpu:3
---------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b CPU chat-completion 6.43 GB MIT deepseek-r1-distill-qwen-7b-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
phi-4-mini CPU chat-completion 4.80 GB MIT Phi-4-mini-instruct-generic-cpu:4
------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning CPU chat-completion 4.52 GB MIT Phi-4-mini-reasoning-generic-cpu:2
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-14b-instruct-generic-cpu:3
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-cpu:3
I am using Qualcommยฎ Hexagonโข NPU Driver version 1.0.0.11 (Sept 4th Release)
At initial startup after installation, the QNN execution provider appears to have been downloaded successfully:
๐ข Service is Started on http://localhost:5272/, PID 13080!
[ ] 0.00 % [Time remaining: about 3m20s] Downloading QNNExecutionProvider [ ]
... lots of progress....
QNNExecutionProvider [####################################] 100.00 % [Time remaining: about 0s] Downloading complete!
System Info
Processor Snapdragon(R) X 12-core X1E80100 @ 3.40 GHz, 3417 Mhz, 12 Core(s), 12 Logical Processor(s) OS Name Microsoft Windows 11 Home Version 10.0.27950 Build 27950
@filipw Thank you for reporting this. Can you please run the following command in a PowerShell window
Get-AppxPackage -AllUsers "*.EP.*" | Select-Object -ExpandProperty PackageFullName
it returns nothing:
PS C:\Users\filip> Get-AppxPackage -AllUsers "*.EP.*" | Select-Object -ExpandProperty PackageFullName
PS C:\Users\filip>
Foundry itself is there
PS C:\Users\filip> Get-AppxPackage -AllUsers "*Foundry*" | Select-Object -ExpandProperty PackageFullName
Microsoft.FoundryLocal_0.7.117.26375_arm64__8wekyb3d8bbwe
PS C:\Users\filip>
I assume this implies the execution provider did not install successfully? Can I force-install it?
Same issue on Intel and Qualcomm NPU devices
Devices running windows 25H2 Intel NPU Surface Laptop 6 ARM Device Lenovo 14S Snapdragon X Elite
Foundry Logs clearly shows on CPU or GPU Models are being downloaded see
Arm Device
2025-09-25 08:54:30.435 +08:00 [INF] Starting Foundry Local CLI with 'model list' 2025-09-25 08:54:31.486 +08:00 [INF] Command:ServiceAutoRegister Status:Success Direct:False Time:3ms 2025-09-25 08:54:31.487 +08:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:AutoRegisterCertifiedEps Status:Success Direct:True Time:0ms 2025-09-25 08:54:32.543 +08:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template. 2025-09-25 08:54:32.544 +08:00 [INF] Total models fetched across all pages: 18 2025-09-25 08:54:32.545 +08:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:2069ms 2025-09-25 08:54:32.547 +08:00 [INF] Command:ModelList Status:Success Direct:True Time:2105ms 2025-09-25 08:54:32.547 +08:00 [INF] Stream disconnected 2025-09-25 08:54:46.858 +08:00 [INF] Starting Foundry Local CLI with 'cache ls' 2025-09-25 08:54:46.867 +08:00 [INF] Command:ServiceStart Status:Skipped Direct:False Time:2ms 2025-09-25 08:54:46.892 +08:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ListDownloadedModels Status:Success Direct:True Time:2ms 2025-09-25 08:54:47.803 +08:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template. 2025-09-25 08:54:47.804 +08:00 [INF] Total models fetched across all pages: 18 2025-09-25 08:54:47.804 +08:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:907ms 2025-09-25 08:54:47.814 +08:00 [INF] Command:CacheList Status:Success Direct:True Time:949ms 2025-09-25 08:54:47.815 +08:00 [INF] Stream disconnected 2025-09-25 09:02:05.584 +08:00 [INF] Starting Foundry Local CLI with 'service restart' 2025-09-25 09:02:05.633 +08:00 [INF] Stopped Inference.Service.Agent PID 7020 2025-09-25 09:02:08.158 +08:00 [INF] Starting service <C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.117.26375_arm64__8wekyb3d8bbwe\Inference.Service.Agent.exe --urls="http://127.0.0.1:0/" --OpenAIServiceSettings:ModelDirPath="C:\Users*****--JsonRpcServer:Run=true --JsonRpcServer:PipeName="inference_agent" --Logging:LogLevel:Default="Information"> 2025-09-25 09:02:08.382 +08:00 [INF] Service is started on http://127.0.0.1:53323/, PID 21528! 2025-09-25 09:02:08.382 +08:00 [INF] Command:ServiceRestart Status:Success Direct:True Time:2756ms 2025-09-25 09:02:08.383 +08:00 [INF] Stream disconnected 2025-09-25 09:02:17.984 +08:00 [INF] Starting Foundry Local CLI with 'model list' 2025-09-25 09:02:18.037 +08:00 [INF] Loaded cached model info for 18 models. SavedAt:9/25/2025 8:32:55 AM 2025-09-25 09:02:19.048 +08:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:AutoRegisterCertifiedEps Status:Success Direct:True Time:0ms 2025-09-25 09:03:17.226 +08:00 [INF] Registering provider 1/1: QNNExecutionProvider 2025-09-25 09:03:17.243 +08:00 [INF] Successfully autoregistered QNNExecutionProvider 2025-09-25 09:03:17.243 +08:00 [INF] Finished attempt to autoregister certified EPs at 9/25/2025 9:03:17 AM; finished in 00:01:08.8798872 2025-09-25 09:03:17.253 +08:00 [INF] Successfully downloaded and registered the following EPs: QNNExecutionProvider. Valid EPs: CPUExecutionProvider, QNNExecutionProvider
2025-09-25 09:03:17.253 +08:00 [INF] Command:ServiceAutoRegister Status:Success Direct:False Time:58210ms 2025-09-25 09:03:17.263 +08:00 [INF] Valid devices: CPU 2025-09-25 09:03:17.263 +08:00 [INF] Valid EPs: CPUExecutionProvider, QNNExecutionProvider 2025-09-25 09:03:19.232 +08:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template. 2025-09-25 09:03:19.234 +08:00 [INF] Total models fetched across all pages: 18 2025-09-25 09:03:19.234 +08:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:1978ms 2025-09-25 09:03:19.239 +08:00 [INF] Command:ModelList Status:Success Direct:True Time:61249ms 2025-09-25 09:03:19.240 +08:00 [INF] Stream disconnected
Intel Device
2025-09-25 00:37:57.380 +01:00 [INF] Starting Foundry Local CLI with '--help'
2025-09-25 00:38:04.183 +01:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-25 00:38:04.444 +01:00 [INF] Timeout connecting to service
System.TimeoutException: The operation has timed out.
at System.IO.Pipes.NamedPipeClientStream.ConnectInternal(Int32, CancellationToken, Int32) + 0xd5
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x3d
--- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x70
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task&, Thread) + 0x6a
--- End of stack trace from previous location ---
at Microsoft.Neutron.Rpc.Client.RpcSessionExtensions.<CreatePipeRpcSessionAsync>d__2`1.MoveNext() + 0x6f
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.Common.ServiceManagement.<ConnectClientAsync>d__8.MoveNext() + 0x6f
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.Common.ServiceManagement.<CheckIsRunning>d__9.MoveNext() + 0x5f
2025-09-25 00:38:04.445 +01:00 [INF] Starting service <C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.117.26375_x64__8wekyb3d8bbwe\Inference.Service.Agent.exe --urls="http://127.0.0.1:0/" --OpenAIServiceSettings:ModelDirPath="C:\Users***" --JsonRpcServer:Run=true --JsonRpcServer:PipeName="inference_agent" --Logging:LogLevel:Default="Information">
2025-09-25 00:38:05.035 +01:00 [INF] Service endpoints are not yet bound, waiting to retry...
2025-09-25 00:38:05.039 +01:00 [INF] Now listening on: http://127.0.0.1:51354
2025-09-25 00:38:05.040 +01:00 [INF] Application started. Press Ctrl+C to shut down.
2025-09-25 00:38:05.040 +01:00 [INF] Hosting environment: Production
2025-09-25 00:38:05.040 +01:00 [INF] Content root path: C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.117.26375_x64__8wekyb3d8bbwe
2025-09-25 00:38:05.069 +01:00 [INF] Downloading provider 1/1: OpenVINOExecutionProvider
2025-09-25 00:38:05.546 +01:00 [INF] Service is started on http://127.0.0.1:51354/, PID 28088!
2025-09-25 00:38:06.585 +01:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:AutoRegisterCertifiedEps Status:Success Direct:True Time:0ms
2025-09-25 00:38:19.380 +01:00 [INF] Registering provider 1/1: OpenVINOExecutionProvider
2025-09-25 00:38:19.405 +01:00 [INF] Successfully autoregistered OpenVINOExecutionProvider
2025-09-25 00:38:19.406 +01:00 [INF] Finished attempt to autoregister certified EPs at 25/09/2025 00:38:19; finished in 00:00:14.3948860
2025-09-25 00:38:19.432 +01:00 [INF] Successfully downloaded and registered the following EPs: OpenVINOExecutionProvider.
Valid EPs: CPUExecutionProvider, OpenVINOExecutionProvider, WebGpuExecutionProvider
2025-09-25 00:38:19.432 +01:00 [INF] Command:ServiceAutoRegister Status:Success Direct:False Time:12852ms 2025-09-25 00:38:19.455 +01:00 [INF] Valid devices: CPU, GPU 2025-09-25 00:38:19.455 +01:00 [INF] Valid EPs: CPUExecutionProvider, OpenVINOExecutionProvider, WebGpuExecutionProvider 2025-09-25 00:38:20.304 +01:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template. 2025-09-25 00:38:20.305 +01:00 [INF] Model Phi-4-reasoning-generic-gpu:1 does not have a valid prompt template. 2025-09-25 00:38:20.305 +01:00 [INF] Total models fetched across all pages: 34 2025-09-25 00:38:20.305 +01:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:871ms 2025-09-25 00:38:20.312 +01:00 [INF] Command:ModelList Status:Success Direct:True Time:16123ms 2025-09-25 00:38:20.313 +01:00 [INF] Stream disconnected
Downgrading to https://github.com/microsoft/Foundry-Local/releases/tag/v0.6.87 confirms NPU Models are available and can be successfully used.
@filipw What Windows Insider build Dev/Canary is that device on?
@leestott Windows 25H2 issue is fixed. Could you please try using version 0.7.117 again to see if the EP downloads successfully? Thank you.
I am on Canary, so it updates every few days, currently on latest 27954.1 from September 25th.
Yes this is now working on 25h2
Confirmed NPU Models are now visable
However we now have a QNN driver issue for Phi-4 models see #262
This is also resolved on Windows Intel
Also the Phi-4 models respond well on Intel devices so this indicates a driver issue on ARM Devices as per old issue #136
@filipw Are you able to list NPU models on your Qualcomm device now?
This is what I get right now (no models returned, using 0.7.117.26375)
PS C:\Users\filip> foundry model list
Exception: No models were returned from the Azure Foundry catalog.
OK, what if I downgrade?
PS C:\Users\filip> winget uninstall Microsoft.FoundryLocal
Found Foundry Local Model Server [Microsoft.FoundryLocal]
Starting package uninstall...
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100%
Successfully uninstalled
PS C:\Users\filip> winget install --id=Microsoft.FoundryLocal -v "0.6.87.59034" -e
Found Foundry Local [Microsoft.FoundryLocal] Version 0.6.87.59034
This application is licensed to you by its owner.
Microsoft is not responsible for, nor does it grant any licenses to, third-party packages.
This package requires the following dependencies:
- Packages
Microsoft.VCLibs.Desktop.14 [>= 14.0.33728.0]
Successfully verified installer hash
Starting package install...
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100%
Successfully installed
On 0.6.87.59034 it works:
PS C:\Users\filip> foundry model list
๐ข Service is Started on http://127.0.0.1:55651/, PID 21152!
Alias Device Task File Size License Model ID
-----------------------------------------------------------------------------------------------
phi-4 CPU chat-completion 10.16 GB MIT Phi-4-generic-cpu
--------------------------------------------------------------------------------------------------------
phi-3.5-mini CPU chat-completion 2.53 GB MIT Phi-3.5-mini-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b NPU chat-completion 7.12 GB MIT deepseek-r1-distill-qwen-14b-qnn-npu
---------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b NPU chat-completion 3.71 GB MIT deepseek-r1-distill-qwen-7b-qnn-npu
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k CPU chat-completion 2.54 GB MIT Phi-3-mini-128k-instruct-generic-cpu
---------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k CPU chat-completion 2.53 GB MIT Phi-3-mini-4k-instruct-generic-cpu
-------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2 CPU chat-completion 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-cpu
-------------------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning NPU chat-completion 2.78 GB MIT Phi-4-mini-reasoning-qnn-npu
CPU chat-completion 4.52 GB MIT Phi-4-mini-reasoning-generic-cpu
-----------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-0.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-1.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-cpu
----------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-14b-instruct-generic-cpu
-----------------------------------------------------------------------------------------------------------------------
qwen2.5-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-7b-instruct-generic-cpu
----------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-cpu
OK, so let's upgrade to 0.7.117.26375 again
PS C:\Users\filip> winget install Microsoft.FoundryLocal
Found an existing package already installed. Trying to upgrade the installed package...
Found Foundry Local [Microsoft.FoundryLocal] Version 0.7.117.26375
This application is licensed to you by its owner.
Microsoft is not responsible for, nor does it grant any licenses to, third-party packages.
This package requires the following dependencies:
- Packages
Microsoft.VCLibs.Desktop.14 [>= 14.0.33728.0]
Successfully verified installer hash
Starting package install...
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 95%
Successfully installed. Restart the application to complete the upgrade.
Now model listing seems to work, but it's a bit suspicious, since the list is identical to the one before, and yesterday after installing 0.7.117.26375 I would get info that EP is being downloaded?
PS C:\Users\filip> foundry model list
Alias Device Task File Size License Model ID
-----------------------------------------------------------------------------------------------
phi-4 CPU chat-completion 10.16 GB MIT Phi-4-generic-cpu
--------------------------------------------------------------------------------------------------------
phi-3.5-mini CPU chat-completion 2.53 GB MIT Phi-3.5-mini-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b NPU chat-completion 7.12 GB MIT deepseek-r1-distill-qwen-14b-qnn-npu
---------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b NPU chat-completion 3.71 GB MIT deepseek-r1-distill-qwen-7b-qnn-npu
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k CPU chat-completion 2.54 GB MIT Phi-3-mini-128k-instruct-generic-cpu
---------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k CPU chat-completion 2.53 GB MIT Phi-3-mini-4k-instruct-generic-cpu
-------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2 CPU chat-completion 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-cpu
-------------------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning NPU chat-completion 2.78 GB MIT Phi-4-mini-reasoning-qnn-npu
CPU chat-completion 4.52 GB MIT Phi-4-mini-reasoning-generic-cpu
-----------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-0.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-1.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-cpu
----------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-14b-instruct-generic-cpu
-----------------------------------------------------------------------------------------------------------------------
qwen2.5-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-7b-instruct-generic-cpu
----------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-cpu
Let's try to run a model - unfortunately it fails
PS C:\Users\filip> foundry model run deepseek-r1-7b
Downloading deepseek-r1-distill-qwen-7b-qnn-npu...
[####################################] 100.00 % [Time remaining: about 0s] 22.8 MB/s
๐ Loading model... [17:24:43 ERR] Failed loading model:deepseek-r1-distill-qwen-7b-qnn-npu
Exception: Failed: Loading model deepseek-r1-distill-qwen-7b-qnn-npu from http://127.0.0.1:55651/openai/load/deepseek-r1-distill-qwen-7b-qnn-npu?ttl=600
Internal Server Error
Failed loading model deepseek-r1-distill-qwen-7b-qnn-npu
Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_5289039109093371165_8_0'
PS C:\Users\filip>
This is in the logs:
2025-09-26 17:29:57.773 +02:00 [INF] Starting Foundry Local CLI with 'model run deepseek-r1-7b'
2025-09-26 17:29:57.783 +02:00 [INF] Command:ServiceStart Status:Skipped Direct:False Time:2ms
2025-09-26 17:29:58.748 +02:00 [INF] Model Phi-4-reasoning-generic-cpu does not have a valid prompt template.
2025-09-26 17:29:58.750 +02:00 [INF] Model deepseek-r1-distill-qwen-14b-generic-cpu is not supported on Arm64 currently.
2025-09-26 17:29:58.750 +02:00 [INF] Model deepseek-r1-distill-qwen-7b-generic-cpu is not supported on Arm64 currently.
2025-09-26 17:29:58.751 +02:00 [INF] Model Phi-4-mini-instruct-generic-cpu is not supported on Arm64 currently.
2025-09-26 17:29:58.751 +02:00 [INF] Total models fetched across all pages: 17
2025-09-26 17:29:58.781 +02:00 [INF] Command:ModelDownload Status:Skipped Direct:False Time:997ms
2025-09-26 17:29:58.782 +02:00 [INF] Command:ServiceList Status:Success Direct:False Time:0ms
2025-09-26 17:29:58.782 +02:00 [INF] Loading model: http://127.0.0.1:55651/openai/load/deepseek-r1-distill-qwen-7b-qnn-npu?ttl=600
2025-09-26 17:29:58.783 +02:00 [INF] Loading model:deepseek-r1-distill-qwen-7b-qnn-npu
2025-09-26 17:31:29.265 +02:00 [ERR] Failed loading model:deepseek-r1-distill-qwen-7b-qnn-npu
2025-09-26 17:31:29.265 +02:00 [INF] Command:ModelLoad Status:Failure Direct:False Time:90483ms
2025-09-26 17:31:29.265 +02:00 [INF] Command:ModelRun Status:Failure Direct:True Time:91484ms
2025-09-26 17:31:29.266 +02:00 [INF] Stream disconnected
2025-09-26 17:31:29.273 +02:00 [INF] LogException
Microsoft.AI.Foundry.Local.Common.FLException: Failed: Loading model deepseek-r1-distill-qwen-7b-qnn-npu from http://127.0.0.1:55651/openai/load/deepseek-r1-distill-qwen-7b-qnn-npu?ttl=600
---> System.Net.Http.HttpRequestException: Internal Server Error
Failed loading model deepseek-r1-distill-qwen-7b-qnn-npu
Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_5289039109093371165_8_0'
at Microsoft.AI.Foundry.Local.Common.Utils.EnsureSuccessStatusCode(HttpResponseMessage, String, Func`2) + 0xe4
--- End of inner exception stack trace ---
at Microsoft.AI.Foundry.Local.Common.Utils.EnsureSuccessStatusCode(HttpResponseMessage, String, Func`2) + 0x140
at Microsoft.AI.Foundry.Local.Common.ModelManagement.<LoadModelAsync>d__10.MoveNext() + 0x8b4
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.Commands.ModelRunCommand.<<Create>b__1_0>d.MoveNext() + 0x1000
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.Common.CommandActionFactory.<>c__DisplayClass0_0`1.<<Create>b__0>d.MoveNext() + 0x238
--- End of stack trace from previous location ---
at System.CommandLine.NamingConventionBinder.CommandHandler.<GetExitCodeAsync>d__66.MoveNext() + 0x5c
--- End of stack trace from previous location ---
at System.CommandLine.NamingConventionBinder.ModelBindingCommandHandler.<InvokeAsync>d__11.MoveNext() + 0x6c
--- End of stack trace from previous location ---
at System.CommandLine.Invocation.InvocationPipeline.<InvokeAsync>d__0.MoveNext() + 0x1f4
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.Program.<Main>d__1.MoveNext() + 0x52c
@filipw
Seems to work for me Processor Snapdragon(R) X Elite - X1E78100 - Qualcomm(R) Oryon(TM) CPU, 3417 Mhz, 12 Core(s), 12 Logical Processor(s) Installed Physical Memory (RAM) 32.0 GB
Loading personal and system profiles took 4233ms. ๎ foundry --version 0.7.117+67073234e7 ๎ foundry model run deepseek-r1-7b Model deepseek-r1-distill-qwen-7b-qnn-npu:1 was found in the local cache. ๐ Loading model... ๐ข Model deepseek-r1-distill-qwen-7b-qnn-npu:1 loaded successfully
Interactive Chat. Enter /? or /help for help. Press Ctrl+C to cancel generation. Type /exit to leave the chat.
Interactive mode, please enter your prompt
what is the capital of spain ๐ง Thinking... ๐ค
Okay, so I need to figure out the capital of Spain. Hmm, I remember Spain is a country in Europe, right? I think it's in the Iberian Peninsula. I've heard of Madrid before, but I'm not 100% sure if that's the capital or just a big city there. Maybe it's Madrid? I think Barcelona is also pretty famous, but I don't think that's the capital. Let me try to recall any other capitals I know. For example, France's capital is Paris, Italy's is Rome, Germany's is Berlin. So Spain must have a capital too. Since I'm not sure about Madrid, maybe I should think about other Spanish cities. Oh, I think there's a city called Madrid that's a major city and a center of government. Yeah, that makes sense because I've heard it mentioned a lot, especially in news or travel. So I'm going to go with Madrid as the capital of Spain.
The capital of Spain is Madrid.
@filipw, can you please try the following steps:
Get-AppxPackage -AllUsers "*.EP.*" | Select-Object -ExpandProperty PackageFullName
If this shows an EP package is installed, remove it.
Now run this command (cleans up metadata associated with the downloaded package):
remove-appxpackage uup://product/Windows.Workload.ExecutionProvider.QNN.arm64
Run
foundry model list
again, to see the NPU models.
Run one of the models to make sure it is working.
Thanks.
Running remove-appxpackage uup://product/Windows.Workload.ExecutionProvider.QNN.arm64 indeed appears to force a redownload of the NPU execution provider, but this does not seem to help - the NPU model are still not there
PS C:\Users\filip> foundry model list
๐ข Service is Started on http://127.0.0.1:60599/, PID 29020!
[####################################] 100.00 % [Time remaining: about 0s] Downloading complete!
Alias Device Task File Size License Model ID
-----------------------------------------------------------------------------------------------
phi-4 CPU chat-completion 10.16 GB MIT Phi-4-generic-cpu:1
----------------------------------------------------------------------------------------------------------
phi-3.5-mini CPU chat-completion 2.53 GB MIT Phi-3.5-mini-instruct-generic-cpu:1
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k CPU chat-completion 2.54 GB MIT Phi-3-mini-128k-instruct-generic-cpu:2
-----------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k CPU chat-completion 2.53 GB MIT Phi-3-mini-4k-instruct-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2 CPU chat-completion 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b CPU chat-completion 11.51 GB MIT deepseek-r1-distill-qwen-14b-generic-cpu:3
---------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b CPU chat-completion 6.43 GB MIT deepseek-r1-distill-qwen-7b-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
phi-4-mini CPU chat-completion 4.80 GB MIT Phi-4-mini-instruct-generic-cpu:4
------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning CPU chat-completion 4.52 GB MIT Phi-4-mini-reasoning-generic-cpu:2
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-14b-instruct-generic-cpu:3
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-cpu:3
And Get-AppxPackage -AllUsers "*.EP.*" | Select-Object -ExpandProperty PackageFullName still returns empty list.
PS C:\Users\filip> Get-AppxPackage -AllUsers "*.EP.*" | Select-Object -ExpandProperty PackageFullName
PS C:\Users\filip>
These are the logs
2025-09-28 09:32:53.901 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-28 09:32:54.172 +02:00 [INF] Timeout connecting to service
System.TimeoutException: The operation has timed out.
at System.IO.Pipes.NamedPipeClientStream.ConnectInternal(Int32, CancellationToken, Int32) + 0xf8
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x44
--- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x78
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task&, Thread) + 0x6c
--- End of stack trace from previous location ---
at Microsoft.Neutron.Rpc.Client.RpcSessionExtensions.<CreatePipeRpcSessionAsync>d__2`1.MoveNext() + 0x74
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.Common.ServiceManagement.<ConnectClientAsync>d__8.MoveNext() + 0x7c
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.Common.ServiceManagement.<CheckIsRunning>d__9.MoveNext() + 0x64
2025-09-28 09:32:54.172 +02:00 [INF] Starting service <C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.117.26375_arm64__8wekyb3d8bbwe\Inference.Service.Agent.exe --urls="http://127.0.0.1:0/" --OpenAIServiceSettings:ModelDirPath="C:\Users\filip\.foundry\cache\models" --JsonRpcServer:Run=true --JsonRpcServer:PipeName="inference_agent" --Logging:LogLevel:Default="Information">
2025-09-28 09:32:54.948 +02:00 [INF] Now listening on: http://127.0.0.1:60599
2025-09-28 09:32:54.948 +02:00 [INF] Application started. Press Ctrl+C to shut down.
2025-09-28 09:32:54.948 +02:00 [INF] Hosting environment: Production
2025-09-28 09:32:54.949 +02:00 [INF] Content root path: C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.117.26375_arm64__8wekyb3d8bbwe\
2025-09-28 09:32:54.949 +02:00 [INF] Service is started on http://127.0.0.1:60599/, PID 29020!
2025-09-28 09:32:54.955 +02:00 [INF] Downloading provider 1/1: QNNExecutionProvider
2025-09-28 09:32:54.983 +02:00 [INF] Loaded cached model info for 17 models. SavedAt:26.09.2025 17:19:25
2025-09-28 09:32:55.991 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:AutoRegisterCertifiedEps Status:Success Direct:True Time:0ms
2025-09-28 09:34:10.871 +02:00 [INF] Registering provider 1/1: QNNExecutionProvider
2025-09-28 09:34:10.889 +02:00 [INF] Successfully autoregistered QNNExecutionProvider
2025-09-28 09:34:10.890 +02:00 [INF] Finished attempt to autoregister certified EPs at 28.09.2025 09:34:10; finished in 00:01:15.9651778
2025-09-28 09:34:10.892 +02:00 [INF] Successfully downloaded and registered the following EPs: QNNExecutionProvider.
Valid EPs: CPUExecutionProvider, QNNExecutionProvider
2025-09-28 09:34:10.892 +02:00 [INF] Command:ServiceAutoRegister Status:Success Direct:False Time:74914ms
2025-09-28 09:34:10.897 +02:00 [INF] Valid devices: CPU
2025-09-28 09:34:10.901 +02:00 [INF] Valid EPs: CPUExecutionProvider, QNNExecutionProvider
2025-09-28 09:34:11.801 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-09-28 09:34:11.801 +02:00 [INF] Total models fetched across all pages: 18
2025-09-28 09:34:11.802 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:909ms
2025-09-28 09:34:11.809 +02:00 [INF] Command:ModelList Status:Success Direct:True Time:77902ms
2025-09-28 09:34:11.809 +02:00 [INF] Stream disconnected
2025-09-28 09:35:23.172 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-28 09:35:23.735 +02:00 [INF] Command:ModelList Status:Success Direct:True Time:555ms
I tried both under a user and admin powershell and there is no difference.
Thank you for trying this. Can you try one more thing, which is to:
- stop the foundry local service: foundry service stop
- uninstall Foundry Local via Add or Remove Programs
- repeat the above steps for removing the QNN EP and meta data
- install Foundry Local again
- try foundry model list again
Thanks, I went through those exact steps, and the result is the same. It appears to download the QNN EP but then does not discover any NPU models.
PS C:\Users\filip> winget install Microsoft.FoundryLocal
Found Foundry Local [Microsoft.FoundryLocal] Version 0.7.117.26375
This application is licensed to you by its owner.
Microsoft is not responsible for, nor does it grant any licenses to, third-party packages.
This package requires the following dependencies:
- Packages
Microsoft.VCLibs.Desktop.14 [>= 14.0.33728.0]
Successfully verified installer hash
Starting package install...
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100%
Successfully installed
PS C:\Users\filip> foundry model list
๐ข Service is Started on http://127.0.0.1:54430/, PID 22064!
[####################################] 100.00 % [Time remaining: about 0s] Downloading complete!
Alias Device Task File Size License Model ID
-----------------------------------------------------------------------------------------------
phi-4 CPU chat-completion 10.16 GB MIT Phi-4-generic-cpu:1
----------------------------------------------------------------------------------------------------------
phi-3.5-mini CPU chat-completion 2.53 GB MIT Phi-3.5-mini-instruct-generic-cpu:1
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k CPU chat-completion 2.54 GB MIT Phi-3-mini-128k-instruct-generic-cpu:2
-----------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k CPU chat-completion 2.53 GB MIT Phi-3-mini-4k-instruct-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2 CPU chat-completion 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b CPU chat-completion 11.51 GB MIT deepseek-r1-distill-qwen-14b-generic-cpu:3
---------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b CPU chat-completion 6.43 GB MIT deepseek-r1-distill-qwen-7b-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
phi-4-mini CPU chat-completion 4.80 GB MIT Phi-4-mini-instruct-generic-cpu:4
------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning CPU chat-completion 4.52 GB MIT Phi-4-mini-reasoning-generic-cpu:2
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-14b-instruct-generic-cpu:3
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-cpu:3
PS C:\Users\filip>
Logs:
2025-09-29 17:05:28.111 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-29 17:05:28.371 +02:00 [INF] Timeout connecting to service
System.TimeoutException: The operation has timed out.
at System.IO.Pipes.NamedPipeClientStream.ConnectInternal(Int32, CancellationToken, Int32) + 0xf8
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x44
--- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x78
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task&, Thread) + 0x6c
--- End of stack trace from previous location ---
at Microsoft.Neutron.Rpc.Client.RpcSessionExtensions.<CreatePipeRpcSessionAsync>d__2`1.MoveNext() + 0x74
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.Common.ServiceManagement.<ConnectClientAsync>d__8.MoveNext() + 0x7c
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.Common.ServiceManagement.<CheckIsRunning>d__9.MoveNext() + 0x64
2025-09-29 17:05:28.372 +02:00 [INF] Starting service <C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.117.26375_arm64__8wekyb3d8bbwe\Inference.Service.Agent.exe --urls="http://127.0.0.1:0/" --OpenAIServiceSettings:ModelDirPath="C:\Users\filip\.foundry\cache\models" --JsonRpcServer:Run=true --JsonRpcServer:PipeName="inference_agent" --Logging:LogLevel:Default="Information">
2025-09-29 17:05:28.706 +02:00 [INF] Service is started on http://127.0.0.1:54430/, PID 22064!
2025-09-29 17:05:28.745 +02:00 [INF] Loaded cached model info for 18 models. SavedAt:28.09.2025 09:34:11
2025-09-29 17:05:29.776 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:AutoRegisterCertifiedEps Status:Success Direct:True Time:23ms
2025-09-29 17:06:14.725 +02:00 [INF] Registering provider 1/1: QNNExecutionProvider
2025-09-29 17:06:14.740 +02:00 [INF] Successfully autoregistered QNNExecutionProvider
2025-09-29 17:06:14.741 +02:00 [INF] Finished attempt to autoregister certified EPs at 29.09.2025 17:06:14; finished in 00:00:46.0722644
2025-09-29 17:06:14.757 +02:00 [INF] Successfully downloaded and registered the following EPs: QNNExecutionProvider.
Valid EPs: CPUExecutionProvider, QNNExecutionProvider
2025-09-29 17:06:14.760 +02:00 [INF] Command:ServiceAutoRegister Status:Success Direct:False Time:45032ms
2025-09-29 17:06:14.766 +02:00 [INF] Valid devices: CPU
2025-09-29 17:06:14.767 +02:00 [INF] Valid EPs: CPUExecutionProvider, QNNExecutionProvider
2025-09-29 17:06:15.937 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-09-29 17:06:15.942 +02:00 [INF] Total models fetched across all pages: 18
2025-09-29 17:06:15.944 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:1178ms
2025-09-29 17:06:15.951 +02:00 [INF] Command:ModelList Status:Success Direct:True Time:47835ms
2025-09-29 17:06:15.954 +02:00 [INF] Stream disconnected
2025-09-29 17:06:31.227 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-29 17:06:31.928 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-09-29 17:06:31.929 +02:00 [INF] Total models fetched across all pages: 18
2025-09-29 17:06:31.929 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:663ms
2025-09-29 17:06:31.935 +02:00 [INF] Command:ModelList Status:Success Direct:True Time:702ms
2025-09-29 17:06:31.937 +02:00 [INF] Stream disconnected
2025-09-29 17:06:32.969 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-29 17:06:33.373 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-09-29 17:06:33.374 +02:00 [INF] Total models fetched across all pages: 18
2025-09-29 17:06:33.374 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:365ms
2025-09-29 17:06:33.381 +02:00 [INF] Command:ModelList Status:Success Direct:True Time:406ms
2025-09-29 17:06:33.385 +02:00 [INF] Stream disconnected
2025-09-29 17:06:34.383 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-29 17:06:34.785 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-09-29 17:06:34.786 +02:00 [INF] Total models fetched across all pages: 18
2025-09-29 17:06:34.786 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:364ms
2025-09-29 17:06:34.790 +02:00 [INF] Command:ModelList Status:Success Direct:True Time:401ms
2025-09-29 17:06:34.791 +02:00 [INF] Stream disconnected
2025-09-29 17:06:37.338 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-29 17:06:38.402 +02:00 [INF] Command:ServiceAutoRegister Status:Success Direct:False Time:8ms
2025-09-29 17:06:38.404 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:AutoRegisterCertifiedEps Status:Success Direct:True Time:0ms
2025-09-29 17:06:39.210 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-09-29 17:06:39.211 +02:00 [INF] Total models fetched across all pages: 18
2025-09-29 17:06:39.211 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:1829ms
2025-09-29 17:06:39.218 +02:00 [INF] Command:ModelList Status:Success Direct:True Time:1873ms
2025-09-29 17:06:39.219 +02:00 [INF] Stream disconnected
Get-AppxPackage -AllUsers "*.EP.*" | Select-Object -ExpandProperty PackageFullName still finds nothing.
Thank you for trying this! We have been working on improving the EP and device discovery and should have a patch out today
Hi I tried on 0.7.120 and it still fails with NPU models, though it's a little more verbose saying that the QNN EP cannot be registered:
PS C:\Users\filip> foundry --version
0.7.120+3b92ed4014
PS C:\Users\filip> foundry model list
๐ข Service is Started on http://127.0.0.1:54613/, PID 23884!
๐ Downloading complete!...
Failed to download or register the following EPs: QNNExecutionProvider. Will try installing again later.
Valid EPs: CPUExecutionProvider
Alias Device Task File Size License Model ID
-----------------------------------------------------------------------------------------------
phi-4 CPU chat-completion 10.16 GB MIT Phi-4-generic-cpu:1
----------------------------------------------------------------------------------------------------------
phi-3.5-mini CPU chat-completion 2.53 GB MIT Phi-3.5-mini-instruct-generic-cpu:1
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k CPU chat-completion 2.54 GB MIT Phi-3-mini-128k-instruct-generic-cpu:2
-----------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k CPU chat-completion 2.53 GB MIT Phi-3-mini-4k-instruct-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2 CPU chat-completion 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b CPU chat-completion 11.51 GB MIT deepseek-r1-distill-qwen-14b-generic-cpu:3
---------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b CPU chat-completion 6.43 GB MIT deepseek-r1-distill-qwen-7b-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b CPU chat-completion 0.80 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b CPU chat-completion 1.78 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
phi-4-mini CPU chat-completion 4.80 GB MIT Phi-4-mini-instruct-generic-cpu:4
------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning CPU chat-completion 4.52 GB MIT Phi-4-mini-reasoning-generic-cpu:2
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-14b-instruct-generic-cpu:3
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-7b CPU chat-completion 6.16 GB apache-2.0 qwen2.5-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b CPU chat-completion 11.06 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-cpu:3
PS C:\Users\filip>
The logs:
2025-10-06 07:29:00.960 +02:00 [INF] Now listening on: http://127.0.0.1:54613
2025-10-06 07:29:00.961 +02:00 [INF] Application started. Press Ctrl+C to shut down.
2025-10-06 07:29:00.961 +02:00 [INF] Hosting environment: Production
2025-10-06 07:29:00.963 +02:00 [INF] Content root path: C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.120.15250_arm64__8wekyb3d8bbwe\
2025-10-06 07:29:00.964 +02:00 [INF] Provider 1/1: QNNExecutionProvider is in NotPresent state.Downloading and ensuring provider is ready.
2025-10-06 07:29:00.964 +02:00 [INF] Found service endpoints: http://127.0.0.1:54613
2025-10-06 07:29:00.964 +02:00 [INF] Service is started on http://127.0.0.1:54613/, PID 23884!
2025-10-06 07:29:00.964 +02:00 [INF] Command:ModelInit Status:Success Direct:True Time:1255ms
2025-10-06 07:29:00.965 +02:00 [INF] Command:ServiceStart Status:Failure Direct:False Time:1256ms
2025-10-06 07:29:00.965 +02:00 [INF] Checking EP autoregistration status...
2025-10-06 07:29:01.031 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.120+3b92ed4014 Command:ServiceStatus Status:Success Direct:True Time:2ms
2025-10-06 07:29:01.033 +02:00 [INF] Processing EP autoregistration status...
2025-10-06 07:29:01.033 +02:00 [INF] Command:ServiceStatus Status:Success Direct:False Time:68ms
2025-10-06 07:29:01.040 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.120+3b92ed4014 Command:AutoRegisterCertifiedEps Status:Success Direct:True Time:1ms
2025-10-06 07:29:01.040 +02:00 [INF] Reporting progress of running EP download
2025-10-06 07:29:33.305 +02:00 [INF] Provider QNNExecutionProvider download and ensuring ready attempt: Failure
2025-10-06 07:29:33.315 +02:00 [INF] Download attempt for QNNExecutionProvider unsuccessful. Skipping all EP downloads.
2025-10-06 07:29:33.316 +02:00 [INF] Finished attempt to autoregister certified EPs; finished in 32443ms
2025-10-06 07:29:33.321 +02:00 [INF] Failed to download or register the following EPs: QNNExecutionProvider. Will try installing again later.
Valid EPs: CPUExecutionProvider
2025-10-06 07:29:33.371 +02:00 [INF] Command:ServiceAutoRegister Status:Failure Direct:False Time:32337ms
2025-10-06 07:29:33.412 +02:00 [INF] Loaded cached model info for 18 models. SavedAt:06.10.2025 07:26:26
2025-10-06 07:29:33.419 +02:00 [INF] Creating new task to ensure and autoregister certified execution providers
2025-10-06 07:29:33.421 +02:00 [INF] Created task to ensure and autoregister certified execution providers
2025-10-06 07:29:33.421 +02:00 [INF] Attempt 2: Autoregistration of certified execution providers in progress.
2025-10-06 07:29:33.421 +02:00 [INF] Started autoregistering certified EPs
2025-10-06 07:29:33.427 +02:00 [INF] Valid devices: CPU
2025-10-06 07:29:33.432 +02:00 [INF] Valid EPs: CPUExecutionProvider, QNNExecutionProvider
2025-10-06 07:29:33.442 +02:00 [INF] Provider 1/1: QNNExecutionProvider is in NotPresent state.Downloading and ensuring provider is ready.
2025-10-06 07:29:34.432 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-10-06 07:29:34.432 +02:00 [INF] Total models fetched across all pages: 18
2025-10-06 07:29:34.433 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.120+3b92ed4014 Command:ModelList Status:Success Direct:True Time:1015ms
2025-10-06 07:29:34.438 +02:00 [INF] Command:ModelInit Status:Success Direct:True Time:1255ms
2025-10-06 07:29:34.440 +02:00 [INF] Command:ModelList Status:Success Direct:True Time:34736ms
2025-10-06 07:29:34.442 +02:00 [INF] Stream disconnected
I traced the requests with Fiddler and this is what I see:
Request:
POST https://ai.azure.com/api/eastus/ux/v1.0/entities/crossRegion HTTP/1.1
Host: ai.azure.com
User-Agent: AzureAiStudio
traceparent: 00-ad4eb3d0ebf3896227d0a0c178755093-3eea31b2997522ca-00
Content-Type: application/json; charset=utf-8
Content-Length: 1173
{
"resourceIds": [
{
"resourceId": "azureml",
"entityContainerType": "Registry"
}
],
"indexEntitiesRequest": {
"filters": [
{
"field": "type",
"operator": "eq",
"values": [
"models"
]
},
{
"field": "kind",
"operator": "eq",
"values": [
"Versioned"
]
},
{
"field": "labels",
"operator": "eq",
"values": [
"latest"
]
},
{
"field": "annotations/tags/foundryLocal",
"operator": "eq",
"values": [
"",
"test"
]
},
{
"field": "properties/variantInfo/variantMetadata/device",
"operator": "eq",
"values": [
"CPU"
]
},
{
"field": "properties/variantInfo/variantMetadata/executionProvider",
"operator": "eq",
"values": [
"CPUExecutionProvider",
"QNNExecutionProvider"
]
}
],
"pageSize": 50,
"skip": null,
"continuationToken": null
}
}
Response:
HTTP/1.1 200 OK
Date: Tue, 07 Oct 2025 15:18:24 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 84801
Connection: keep-alive
Vary: Accept-Encoding
Request-Context: appId=cid-v1:2d2e8e63-272e-4b3c-8598-4ee570a0e70d
x-ms-response-type: standard
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Content-Type-Options: nosniff
azureml-served-by-cluster: vienna-eastus-02
x-request-time: 0.117
x-azure-ref: 20251007T151824Z-164558f69d6cmmnwhC1ZRHr0ps00000004r00000000099ks
X-Cache: CONFIG_NOCACHE
Accept-Ranges: bytes
{"indexEntitiesResponse":{"totalCount":null,"value":[{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-4-generic-cpu/version/1","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-4","author":"Microsoft","directoryPath":"cpu-int4-rtn-block-32-acc-level-4","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/phi-4/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|system|>\\n{Content}<|im_end|>\", \"user\": \"<|user|>\\n{Content}<|im_end|>\", \"assistant\": \"<|assistant|>\\n{Content}<|im_end|>\", \"prompt\": \"<|user|>\\n{Content}<|im_end|>\\n<|assistant|>\"}","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-4-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-4 to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-4 for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-4](https://huggingface.co/microsoft/Phi-4) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-4-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-07T17:43:28.8799+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-4-generic-cpu:1","name":"Phi-4-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":1,"alphanumericVersion":"1","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-07T17:43:28.4189955Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-4/versions/7"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":10909216931,"vRamFootprintBytes":10909502914}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":20,"type":"models","version":"1","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-4-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-4-generic-cpu/versions/1","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-3.5-mini-instruct-generic-cpu/version/1","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-3.5-mini","author":"Microsoft","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/Phi-3.5-mini-instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-3.5-mini-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-3.5-mini-instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-3.5-mini-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-07T19:23:10.1295265+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-3.5-mini-instruct-generic-cpu:1","name":"Phi-3.5-mini-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":1,"alphanumericVersion":"1","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-07T19:23:09.7140407Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-3.5-mini-instruct/versions/6"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":2716566814,"vRamFootprintBytes":2768702013}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"1","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-3.5-mini-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-3.5-mini-instruct-generic-cpu/versions/1","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-4-reasoning-generic-cpu/version/1","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-4-reasoning","author":"Microsoft","directoryPath":"v1","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/phi-4/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-4-reasoning-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-4-reasoning to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-4-reasoning for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-4-reasoning](https://huggingface.co/microsoft/Phi-4-reasoning) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-4-reasoning-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-11T01:18:23.6244346+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-4-reasoning-generic-cpu:1","name":"Phi-4-reasoning-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":1,"alphanumericVersion":"1","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-11T01:18:23.1946551Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-4-reasoning/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":10909216931,"vRamFootprintBytes":10909483885}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"1","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-4-reasoning-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-4-reasoning-generic-cpu/versions/1","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-3-mini-128k-instruct-generic-cpu/version/2","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-3-mini-128k","author":"Microsoft","directoryPath":"cpu-int4-rtn-block-32-acc-level-4","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|system|>\\n{Content}<|end|>\", \"user\": \"<|user|>\\n{Content}<|end|>\", \"assistant\": \"<|assistant|>\\n{Content}<|end|>\", \"prompt\": \"<|user|>\\n{Content}<|end|>\\n<|assistant|>\"}","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-3-mini-128k-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-3-Mini-128K-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-3-Mini-128K-Instruct](https://huggingface.co/microsoft/Phi-3-Mini-128K-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-3-mini-128k-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-12T22:43:58.0656724+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-3-mini-128k-instruct-generic-cpu:2","name":"Phi-3-mini-128k-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":2,"alphanumericVersion":"2","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-12T22:43:57.6918486Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-3-mini-128k-instruct/versions/13"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":2727304232,"vRamFootprintBytes":2727517409}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"2","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-3-mini-128k-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-3-mini-128k-instruct-generic-cpu/versions/2","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-3-mini-4k-instruct-generic-cpu/version/2","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-3-mini-4k","author":"Microsoft","directoryPath":"cpu-int4-rtn-block-32-acc-level-4","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|system|>\\n{Content}<|end|>\", \"user\": \"<|user|>\\n{Content}<|end|>\", \"assistant\": \"<|assistant|>\\n{Content}<|end|>\", \"prompt\": \"<|user|>\\n{Content}<|end|>\\n<|assistant|>\"}","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-3-mini-4k-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-3-Mini-4K-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-3-Mini-4K-Instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-3-mini-4k-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-12T22:59:32.8315613+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-3-mini-4k-instruct-generic-cpu:2","name":"Phi-3-mini-4k-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":2,"alphanumericVersion":"2","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-12T22:59:32.2182163Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/15"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":2716566814,"vRamFootprintBytes":3237709086}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"2","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-3-mini-4k-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-3-mini-4k-instruct-generic-cpu/versions/2","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/mistralai-Mistral-7B-Instruct-v0-2-generic-cpu/version/2","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"mistral-7b-v0.2","author":"Microsoft","directoryPath":"mistral-7b-instruct-v0.2-cpu-int4-rtn-block-32-acc-level-4","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://www.apache.org/licenses/LICENSE-2.0.html>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<s>\", \"user\": \"[INST]\\n{Content}\\n[/INST]\", \"assistant\": \"{Content}</s>\", \"prompt\": \"[INST]\\n{Content}\\n[/INST]\"}","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"mistralai-Mistral-7B-Instruct-v0-2-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** apache-2.0\n- **License:** MIT\n- **Model Description:** This is a conversion of the Mistral-7B-Instruct-v0.2 for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) for details.\n","labels":["default","latest","invisibleLatest"],"name":"mistralai-Mistral-7B-Instruct-v0-2-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-12T23:21:21.9132405+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"mistralai-Mistral-7B-Instruct-v0-2-generic-cpu:2","name":"mistralai-Mistral-7B-Instruct-v0-2-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":2,"alphanumericVersion":"2","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-12T23:21:21.2397546Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/mistralai-Mistral-7B-Instruct-v0-2/versions/6"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":4370129223,"vRamFootprintBytes":4491341762}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"2","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"mistralai-Mistral-7B-Instruct-v0-2-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/mistralai-Mistral-7B-Instruct-v0-2-generic-cpu/versions/2","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/deepseek-r1-distill-qwen-14b-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"deepseek-r1-14b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"assistant\": \"{Content}\", \"prompt\": \"\\\\u003C\\\\uFF5CUser\\\\uFF5C\\\\u003E{Content}\\\\u003C\\\\uFF5CAssistant\\\\uFF5C\\\\u003E\"}","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"deepseek-r1-distill-qwen-14b-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-14B for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) for details.\n","labels":["default","latest","invisibleLatest"],"name":"deepseek-r1-distill-qwen-14b-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-14T21:37:41.6801451+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"deepseek-r1-distill-qwen-14b-generic-cpu:3","name":"deepseek-r1-distill-qwen-14b-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-14T21:37:40.8451674Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/deepseek-r1-distill-qwen-14b/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":12358768394,"vRamFootprintBytes":12359149537}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"deepseek-r1-distill-qwen-14b-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/deepseek-r1-distill-qwen-14b-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/deepseek-r1-distill-qwen-7b-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"deepseek-r1-7b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"assistant\": \"{Content}\", \"prompt\": \"\\\\u003C\\\\uFF5CUser\\\\uFF5C\\\\u003E{Content}\\\\u003C\\\\uFF5CAssistant\\\\uFF5C\\\\u003E\"}","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"deepseek-r1-distill-qwen-7b-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) for details.\n","labels":["default","latest","invisibleLatest"],"name":"deepseek-r1-distill-qwen-7b-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-14T22:13:57.0519582+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"deepseek-r1-distill-qwen-7b-generic-cpu:3","name":"deepseek-r1-distill-qwen-7b-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-14T22:13:56.2929352Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/deepseek-r1-distill-qwen-7b/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":6904159928,"vRamFootprintBytes":6904383723}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"deepseek-r1-distill-qwen-7b-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/deepseek-r1-distill-qwen-7b-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-0.5b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-0.5b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","parameterSchema":"{\"enabled\": [{\"name\": \"temperature\", \"default\": 0.7}, {\"name\": \"top_p\", \"default\": 0.8}, {\"name\": \"top_k\", \"default\": 40}, {\"name\": \"presence_penalty\", \"default\": 1.1}]}","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-0.5b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-0.5B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-0.5b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-03T23:59:50.9468509+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-0.5b-instruct-generic-cpu:3","name":"qwen2.5-0.5b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-03T23:59:50.6496731Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-0.5b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":861939957,"vRamFootprintBytes":862107904}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-0.5b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-0.5b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-1.5b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-1.5b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-1.5b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-1.5B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-1.5b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T00:21:44.3045772+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-1.5b-instruct-generic-cpu:3","name":"qwen2.5-1.5b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T00:21:43.7807596Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-1.5b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":1911260446,"vRamFootprintBytes":1911456583}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-1.5b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-1.5b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-coder-0.5b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-coder-0.5b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","parameterSchema":"{\"enabled\": [{\"name\": \"temperature\", \"default\": 1.0}, {\"name\": \"top_p\", \"default\": 0.9}, {\"name\": \"top_k\", \"default\": 40}, {\"name\": \"presence_penalty\", \"default\": 1.1}]}","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-coder-0.5b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-Coder-0.5B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-coder-0.5b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T00:31:40.6393493+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-coder-0.5b-instruct-generic-cpu:3","name":"qwen2.5-coder-0.5b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T00:31:40.2585336Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-coder-0.5b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":861939957,"vRamFootprintBytes":1035133255}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-coder-0.5b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-coder-0.5b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-coder-7b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-coder-7b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-coder-7b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-Coder-7B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-coder-7b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T00:41:58.7940727+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-coder-7b-instruct-generic-cpu:3","name":"qwen2.5-coder-7b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T00:41:58.3829876Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-coder-7b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":6614249635,"vRamFootprintBytes":6614587351}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-coder-7b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-coder-7b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-coder-1.5b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-coder-1.5b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-coder-1.5b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-Coder-1.5B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-coder-1.5b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T00:52:43.5436122+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-coder-1.5b-instruct-generic-cpu:3","name":"qwen2.5-coder-1.5b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T00:52:43.1677432Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-coder-1.5b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":1911260446,"vRamFootprintBytes":1911457904}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-coder-1.5b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-coder-1.5b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-4-mini-instruct-generic-cpu/version/4","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-4-mini","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/Phi-4-mini-instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|system|>{Content}<|end|>\", \"user\": \"<|user|>{Content}<|end|>\", \"assistant\": \"<|assistant|>{Content}<|end|>\", \"prompt\": \"<|user|>{Content}<|end|><|assistant|>\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"<|/tool_call|>","toolCallStart":"<|tool_call|>","toolRegisterEnd":"<|/tool|>","toolRegisterStart":"<|tool|>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-4-mini-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-4-mini-instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-4-mini-instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-4-mini-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T16:13:47.122296+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-4-mini-instruct-generic-cpu:4","name":"Phi-4-mini-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":4,"alphanumericVersion":"4","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T16:13:45.9261209Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-4-mini-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":5153960755,"vRamFootprintBytes":5206105439}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"4","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-4-mini-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-4-mini-instruct-generic-cpu/versions/4","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-4-mini-reasoning-generic-cpu/version/2","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-4-mini-reasoning","author":"Microsoft","directoryPath":"v1","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/Phi-4-mini-reasoning/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|system|>Your name is Phi, an AI math expert developed by Microsoft. {Content}<|end|>\", \"user\": \"<|user|>{Content}<|end|>\", \"assistant\": \"<|assistant|>{Content}<|end|>\", \"prompt\": \"<|user|>{Content}<|end|><|assistant|>\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"<|/tool_call|>","toolCallStart":"<|tool_call|>","toolRegisterEnd":"<|/tool|>","toolRegisterStart":"<|tool|>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-4-mini-reasoning-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-4-mini-reasoning to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-4-mini-reasoning for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-4-mini-reasoning](https://huggingface.co/microsoft/Phi-4-mini-reasoning) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-4-mini-reasoning-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T17:05:14.0779428+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-4-mini-reasoning-generic-cpu:2","name":"Phi-4-mini-reasoning-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":2,"alphanumericVersion":"2","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T17:05:13.7904476Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-4-mini-reasoning/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":4853313044,"vRamFootprintBytes":4905427271}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"2","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-4-mini-reasoning-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-4-mini-reasoning-generic-cpu/versions/2","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-14b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-14b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-14B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-14b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-14B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-14b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T19:51:51.1044735+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-14b-instruct-generic-cpu:3","name":"qwen2.5-14b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T19:51:50.1856085Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-14b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":11875584573,"vRamFootprintBytes":11875920035}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-14b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-14b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-7b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-7b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-7b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-7B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-7b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T20:57:31.7803387+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-7b-instruct-generic-cpu:3","name":"qwen2.5-7b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T20:57:31.1457087Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-7b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":6614249635,"vRamFootprintBytes":6614446407}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-7b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-7b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-coder-14b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-coder-14b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-coder-14b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-Coder-14B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-coder-14b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T21:00:59.5956532+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-coder-14b-instruct-generic-cpu:3","name":"qwen2.5-coder-14b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T21:00:58.9017408Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-coder-14b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":11875584573,"vRamFootprintBytes":11875922365}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-coder-14b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-coder-14b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/openai-whisper-tiny-generic-cpu/version/1","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":false,"tags":{"alias":"whisper-tiny","author":"Microsoft","directoryPath":"openai-whisper-tiny-generic-cpu","foundryLocal":"test","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at https://www.apache.org/licenses/LICENSE-2.0.html.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"prompt\": \"<|startoftranscript|> <|en|> <|transcribe|> <|notimestamps|>\"}","task":"automatic speech recognition"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["automatic speech recognition"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"openai-whisper-tiny-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"Whisper is an OpenAI pre-trained speech recognition model with potential applications for ASR solutions for developers. However, due to weak supervision and large-scale noisy data, it should be used with caution in high-risk domains. The model has been trained on 680k hours of audio data representing 98 different languages, leading to improved robustness and accuracy compared to existing ASR systems. However, there are disparities in performance across languages and the model is prone to generating repetitive texts, which may increase in low-resource languages. There are dual-use concerns and real economic implications with such performance disparities, and the model may also have the capacity to recognize specific individuals. The affordable cost of automatic transcription and translation of large volumes of audio communication is a potential benefit, but the cost of transcription may limit the expansion of surveillance projects.\n\nThe tiny model is the smallest variant in the Whisper family, offering faster inference times with reduced accuracy compared to larger models, making it suitable for resource-constrained environments and real-time applications where speed is prioritized over precision.\n\n> The above summary was generated using ChatGPT. Review the <a href=\"https://huggingface.co/openai/whisper-tiny\" target=\"_blank\">original model card</a> to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model.\n\nThis model is an optimized version of OpenAI-whisper-tiny to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** OpenAI\n- **Model type:** apache-2.0\n- **License:** Apache license 2.0\n- **Model Description:** This is a conversion of the OpenAI-whisper-tiny for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [OpenAI-whisper-tiny](https://huggingface.co/openai/whisper-tiny) for details.\n","labels":["default","latest","invisibleLatest"],"name":"openai-whisper-tiny-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-09-10T20:06:28.28651+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"openai-whisper-tiny-generic-cpu:1","name":"openai-whisper-tiny-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":1,"alphanumericVersion":"1","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-09-11T18:52:39.7230022Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azure-openai/models/whisper/versions/001"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":193167360,"vRamFootprintBytes":193392097}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"1","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"openai-whisper-tiny-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/openai-whisper-tiny-generic-cpu/versions/1","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null}],"nextSkip":null,"continuationToken":null,"entityContainerIdsToEntityContainerMetadata":{"azureml":{"resourceId":"azureml","subscriptionId":"6c6683e9-e5fe-4038-8519-ce6ebec2ba15","resourceGroup":"registry-builtin-prod-eastus-01","resourceName":"azureml","entityContainerType":"Registry","regions":[{"regionName":"eastus","isPrimaryRegion":true},{"regionName":"australiaeast","isPrimaryRegion":false},{"regionName":"australiasoutheast","isPrimaryRegion":false},{"regionName":"brazilsouth","isPrimaryRegion":false},{"regionName":"canadacentral","isPrimaryRegion":false},{"regionName":"canadaeast","isPrimaryRegion":false},{"regionName":"centralindia","isPrimaryRegion":false},{"regionName":"centralus","isPrimaryRegion":false},{"regionName":"eastasia","isPrimaryRegion":false},{"regionName":"eastus2","isPrimaryRegion":false},{"regionName":"francecentral","isPrimaryRegion":false},{"regionName":"germanywestcentral","isPrimaryRegion":false},{"regionName":"japaneast","isPrimaryRegion":false},{"regionName":"japanwest","isPrimaryRegion":false},{"regionName":"jioindiawest","isPrimaryRegion":false},{"regionName":"koreacentral","isPrimaryRegion":false},{"regionName":"northcentralus","isPrimaryRegion":false},{"regionName":"northeurope","isPrimaryRegion":false},{"regionName":"norwayeast","isPrimaryRegion":false},{"regionName":"southafricanorth","isPrimaryRegion":false},{"regionName":"southcentralus","isPrimaryRegion":false},{"regionName":"southeastasia","isPrimaryRegion":false},{"regionName":"swedencentral","isPrimaryRegion":false},{"regionName":"switzerlandnorth","isPrimaryRegion":false},{"regionName":"uaenorth","isPrimaryRegion":false},{"regionName":"uksouth","isPrimaryRegion":false},{"regionName":"ukwest","isPrimaryRegion":false},{"regionName":"westcentralus","isPrimaryRegion":false},{"regionName":"westeurope","isPrimaryRegion":false},{"regionName":"westus","isPrimaryRegion":false},{"regionName":"westus2","isPrimaryRegion":false},{"regionName":"westus3","isPrimaryRegion":false},{"regionName":"qatarcentral","isPrimaryRegion":false},{"regionName":"polandcentral","isPrimaryRegion":false},{"regionName":"southindia","isPrimaryRegion":false},{"regionName":"switzerlandwest","isPrimaryRegion":false},{"regionName":"italynorth","isPrimaryRegion":false},{"regionName":"spaincentral","isPrimaryRegion":false},{"regionName":"israelcentral","isPrimaryRegion":false},{"regionName":"taiwannorth","isPrimaryRegion":false},{"regionName":"centraluseuap","isPrimaryRegion":false},{"regionName":"eastus2euap","isPrimaryRegion":false}],"tenantId":"33e01921-4d64-4f8c-a055-5bdaffd5e33d","immutableResourceId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","isPublicResource":true,"isTradeRestrictedResource":false}},"resourcesNotQueriedReasons":{},"numberOfEntityContainersNotQueried":null,"fanoutData":null,"regionalFanoutState":null,"shardErrors":null,"canSupportSkip":false,"facets":null},"regionalErrors":{},"resourceSkipReasons":{},"shardErrors":{},"numberOfResourcesNotIncludedInSearch":0}
This is also resolved on Windows Intel
Also the Phi-4 models respond well on Intel devices so this indicates a driver issue on ARM Devices as per old issue [#136](https://github.com/microsoft/Foundry-Local/issues/136)
Woah! What did you do to get all the GPU and NPU models for Intel? I have the below and all I see are GPU/CPU. i;m on:
C:\Users\gptestuser>foundry --version
0.7.120+3b92ed4014
CPU Info:
Name: Intel(R) Core(TM) Ultra 7 268V | Cores: 8 | Threads: 8
NPU/Compute Accelerator Info:
Name: Intel(R) AI Boost | Description: Intel(R) AI Boost
GPU Info:
Name: Intel(R) Arc(TM) 140V GPU (16GB) | VRAM: None MB | VideoProcessor: Intel(R) Arc(TM) 140V GPU (16GB) Family | DriverVersion: 32.0.101.8132
Name: Microsoft Remote Display Adapter | VRAM: None MB | VideoProcessor: None | DriverVersion: 10.0.26100.6725
System RAM:
Total RAM: 31.48 GB
@filipw Checking in to see whether you can list your NPU models with the latest release
Also the Phi-4 models respond well on Intel devices so this indicates a driver issue on ARM Devices as per old issue [#136](https://github.com/microsoft/Foundry-Local/issues/136)