Foundry-Local icon indicating copy to clipboard operation
Foundry-Local copied to clipboard

NPU models no longer listed in 0.7.117

Open filipw opened this issue 3 months ago โ€ข 22 comments

I just upgraded to v0.7.117 on Surface Pro 11

All the NPU models have disappeared, foundry model list shows only CPU models

PS C:\Users\filip> foundry model list
Alias                          Device     Task               File Size    License      Model ID
-----------------------------------------------------------------------------------------------
phi-4                          CPU        chat-completion    10.16 GB     MIT          Phi-4-generic-cpu:1
----------------------------------------------------------------------------------------------------------
phi-3.5-mini                   CPU        chat-completion    2.53 GB      MIT          Phi-3.5-mini-instruct-generic-cpu:1
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k                CPU        chat-completion    2.54 GB      MIT          Phi-3-mini-128k-instruct-generic-cpu:2
-----------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k                  CPU        chat-completion    2.53 GB      MIT          Phi-3-mini-4k-instruct-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2                CPU        chat-completion    4.07 GB      apache-2.0   mistralai-Mistral-7B-Instruct-v0-2-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b                CPU        chat-completion    11.51 GB     MIT          deepseek-r1-distill-qwen-14b-generic-cpu:3
---------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b                 CPU        chat-completion    6.43 GB      MIT          deepseek-r1-distill-qwen-7b-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b                   CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b                   CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b             CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-coder-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b               CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-coder-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b             CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-coder-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
phi-4-mini                     CPU        chat-completion    4.80 GB      MIT          Phi-4-mini-instruct-generic-cpu:4
------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning           CPU        chat-completion    4.52 GB      MIT          Phi-4-mini-reasoning-generic-cpu:2
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b                    CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-14b-instruct-generic-cpu:3
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-7b                     CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b              CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-coder-14b-instruct-generic-cpu:3

I am using Qualcommยฎ Hexagonโ„ข NPU Driver version 1.0.0.11 (Sept 4th Release)

At initial startup after installation, the QNN execution provider appears to have been downloaded successfully:

๐ŸŸข Service is Started on http://localhost:5272/, PID 13080!
[                                    ]   0.00 % [Time remaining: about 3m20s]    Downloading QNNExecutionProvider       [                                    ]
... lots of progress....
QNNExecutionProvider       [####################################] 100.00 % [Time remaining: about 0s]       Downloading complete!

System Info

Processor Snapdragon(R) X 12-core X1E80100 @ 3.40 GHz, 3417 Mhz, 12 Core(s), 12 Logical Processor(s) OS Name Microsoft Windows 11 Home Version 10.0.27950 Build 27950

filipw avatar Sep 25 '25 04:09 filipw

@filipw Thank you for reporting this. Can you please run the following command in a PowerShell window

Get-AppxPackage -AllUsers "*.EP.*"  | Select-Object -ExpandProperty PackageFullName

natke avatar Sep 25 '25 06:09 natke

it returns nothing:

PS C:\Users\filip> Get-AppxPackage -AllUsers "*.EP.*"  | Select-Object -ExpandProperty PackageFullName
PS C:\Users\filip>

Foundry itself is there

PS C:\Users\filip> Get-AppxPackage -AllUsers "*Foundry*" | Select-Object -ExpandProperty PackageFullName
Microsoft.FoundryLocal_0.7.117.26375_arm64__8wekyb3d8bbwe
PS C:\Users\filip>

I assume this implies the execution provider did not install successfully? Can I force-install it?

filipw avatar Sep 25 '25 07:09 filipw

Same issue on Intel and Qualcomm NPU devices

Devices running windows 25H2 Intel NPU Surface Laptop 6 ARM Device Lenovo 14S Snapdragon X Elite

Foundry Logs clearly shows on CPU or GPU Models are being downloaded see

Arm Device

2025-09-25 08:54:30.435 +08:00 [INF] Starting Foundry Local CLI with 'model list' 2025-09-25 08:54:31.486 +08:00 [INF] Command:ServiceAutoRegister Status:Success Direct:False Time:3ms 2025-09-25 08:54:31.487 +08:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:AutoRegisterCertifiedEps Status:Success Direct:True Time:0ms 2025-09-25 08:54:32.543 +08:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template. 2025-09-25 08:54:32.544 +08:00 [INF] Total models fetched across all pages: 18 2025-09-25 08:54:32.545 +08:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:2069ms 2025-09-25 08:54:32.547 +08:00 [INF] Command:ModelList Status:Success Direct:True Time:2105ms 2025-09-25 08:54:32.547 +08:00 [INF] Stream disconnected 2025-09-25 08:54:46.858 +08:00 [INF] Starting Foundry Local CLI with 'cache ls' 2025-09-25 08:54:46.867 +08:00 [INF] Command:ServiceStart Status:Skipped Direct:False Time:2ms 2025-09-25 08:54:46.892 +08:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ListDownloadedModels Status:Success Direct:True Time:2ms 2025-09-25 08:54:47.803 +08:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template. 2025-09-25 08:54:47.804 +08:00 [INF] Total models fetched across all pages: 18 2025-09-25 08:54:47.804 +08:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:907ms 2025-09-25 08:54:47.814 +08:00 [INF] Command:CacheList Status:Success Direct:True Time:949ms 2025-09-25 08:54:47.815 +08:00 [INF] Stream disconnected 2025-09-25 09:02:05.584 +08:00 [INF] Starting Foundry Local CLI with 'service restart' 2025-09-25 09:02:05.633 +08:00 [INF] Stopped Inference.Service.Agent PID 7020 2025-09-25 09:02:08.158 +08:00 [INF] Starting service <C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.117.26375_arm64__8wekyb3d8bbwe\Inference.Service.Agent.exe --urls="http://127.0.0.1:0/" --OpenAIServiceSettings:ModelDirPath="C:\Users*****--JsonRpcServer:Run=true --JsonRpcServer:PipeName="inference_agent" --Logging:LogLevel:Default="Information"> 2025-09-25 09:02:08.382 +08:00 [INF] Service is started on http://127.0.0.1:53323/, PID 21528! 2025-09-25 09:02:08.382 +08:00 [INF] Command:ServiceRestart Status:Success Direct:True Time:2756ms 2025-09-25 09:02:08.383 +08:00 [INF] Stream disconnected 2025-09-25 09:02:17.984 +08:00 [INF] Starting Foundry Local CLI with 'model list' 2025-09-25 09:02:18.037 +08:00 [INF] Loaded cached model info for 18 models. SavedAt:9/25/2025 8:32:55 AM 2025-09-25 09:02:19.048 +08:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:AutoRegisterCertifiedEps Status:Success Direct:True Time:0ms 2025-09-25 09:03:17.226 +08:00 [INF] Registering provider 1/1: QNNExecutionProvider 2025-09-25 09:03:17.243 +08:00 [INF] Successfully autoregistered QNNExecutionProvider 2025-09-25 09:03:17.243 +08:00 [INF] Finished attempt to autoregister certified EPs at 9/25/2025 9:03:17 AM; finished in 00:01:08.8798872 2025-09-25 09:03:17.253 +08:00 [INF] Successfully downloaded and registered the following EPs: QNNExecutionProvider. Valid EPs: CPUExecutionProvider, QNNExecutionProvider

2025-09-25 09:03:17.253 +08:00 [INF] Command:ServiceAutoRegister Status:Success Direct:False Time:58210ms 2025-09-25 09:03:17.263 +08:00 [INF] Valid devices: CPU 2025-09-25 09:03:17.263 +08:00 [INF] Valid EPs: CPUExecutionProvider, QNNExecutionProvider 2025-09-25 09:03:19.232 +08:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template. 2025-09-25 09:03:19.234 +08:00 [INF] Total models fetched across all pages: 18 2025-09-25 09:03:19.234 +08:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:1978ms 2025-09-25 09:03:19.239 +08:00 [INF] Command:ModelList Status:Success Direct:True Time:61249ms 2025-09-25 09:03:19.240 +08:00 [INF] Stream disconnected

Intel Device

2025-09-25 00:37:57.380 +01:00 [INF] Starting Foundry Local CLI with '--help' 2025-09-25 00:38:04.183 +01:00 [INF] Starting Foundry Local CLI with 'model list' 2025-09-25 00:38:04.444 +01:00 [INF] Timeout connecting to service System.TimeoutException: The operation has timed out. at System.IO.Pipes.NamedPipeClientStream.ConnectInternal(Int32, CancellationToken, Int32) + 0xd5 at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x3d --- End of stack trace from previous location --- at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x70 at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task&, Thread) + 0x6a --- End of stack trace from previous location --- at Microsoft.Neutron.Rpc.Client.RpcSessionExtensions.<CreatePipeRpcSessionAsync>d__2`1.MoveNext() + 0x6f --- End of stack trace from previous location --- at Microsoft.AI.Foundry.Local.Common.ServiceManagement.<ConnectClientAsync>d__8.MoveNext() + 0x6f --- End of stack trace from previous location --- at Microsoft.AI.Foundry.Local.Common.ServiceManagement.<CheckIsRunning>d__9.MoveNext() + 0x5f 2025-09-25 00:38:04.445 +01:00 [INF] Starting service <C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.117.26375_x64__8wekyb3d8bbwe\Inference.Service.Agent.exe --urls="http://127.0.0.1:0/" --OpenAIServiceSettings:ModelDirPath="C:\Users***" --JsonRpcServer:Run=true --JsonRpcServer:PipeName="inference_agent" --Logging:LogLevel:Default="Information"> 2025-09-25 00:38:05.035 +01:00 [INF] Service endpoints are not yet bound, waiting to retry... 2025-09-25 00:38:05.039 +01:00 [INF] Now listening on: http://127.0.0.1:51354 2025-09-25 00:38:05.040 +01:00 [INF] Application started. Press Ctrl+C to shut down. 2025-09-25 00:38:05.040 +01:00 [INF] Hosting environment: Production 2025-09-25 00:38:05.040 +01:00 [INF] Content root path: C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.117.26375_x64__8wekyb3d8bbwe
2025-09-25 00:38:05.069 +01:00 [INF] Downloading provider 1/1: OpenVINOExecutionProvider 2025-09-25 00:38:05.546 +01:00 [INF] Service is started on http://127.0.0.1:51354/, PID 28088! 2025-09-25 00:38:06.585 +01:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:AutoRegisterCertifiedEps Status:Success Direct:True Time:0ms 2025-09-25 00:38:19.380 +01:00 [INF] Registering provider 1/1: OpenVINOExecutionProvider 2025-09-25 00:38:19.405 +01:00 [INF] Successfully autoregistered OpenVINOExecutionProvider 2025-09-25 00:38:19.406 +01:00 [INF] Finished attempt to autoregister certified EPs at 25/09/2025 00:38:19; finished in 00:00:14.3948860 2025-09-25 00:38:19.432 +01:00 [INF] Successfully downloaded and registered the following EPs: OpenVINOExecutionProvider. Valid EPs: CPUExecutionProvider, OpenVINOExecutionProvider, WebGpuExecutionProvider

2025-09-25 00:38:19.432 +01:00 [INF] Command:ServiceAutoRegister Status:Success Direct:False Time:12852ms 2025-09-25 00:38:19.455 +01:00 [INF] Valid devices: CPU, GPU 2025-09-25 00:38:19.455 +01:00 [INF] Valid EPs: CPUExecutionProvider, OpenVINOExecutionProvider, WebGpuExecutionProvider 2025-09-25 00:38:20.304 +01:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template. 2025-09-25 00:38:20.305 +01:00 [INF] Model Phi-4-reasoning-generic-gpu:1 does not have a valid prompt template. 2025-09-25 00:38:20.305 +01:00 [INF] Total models fetched across all pages: 34 2025-09-25 00:38:20.305 +01:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True Time:871ms 2025-09-25 00:38:20.312 +01:00 [INF] Command:ModelList Status:Success Direct:True Time:16123ms 2025-09-25 00:38:20.313 +01:00 [INF] Stream disconnected

leestott avatar Sep 25 '25 09:09 leestott

Downgrading to https://github.com/microsoft/Foundry-Local/releases/tag/v0.6.87 confirms NPU Models are available and can be successfully used.

leestott avatar Sep 25 '25 10:09 leestott

@filipw What Windows Insider build Dev/Canary is that device on?

jaholme avatar Sep 25 '25 18:09 jaholme

@leestott Windows 25H2 issue is fixed. Could you please try using version 0.7.117 again to see if the EP downloads successfully? Thank you.

timenick avatar Sep 26 '25 00:09 timenick

I am on Canary, so it updates every few days, currently on latest 27954.1 from September 25th.

filipw avatar Sep 26 '25 08:09 filipw

Yes this is now working on 25h2

Confirmed NPU Models are now visable

Image

However we now have a QNN driver issue for Phi-4 models see #262

leestott avatar Sep 26 '25 10:09 leestott

This is also resolved on Windows Intel

Image

Also the Phi-4 models respond well on Intel devices so this indicates a driver issue on ARM Devices as per old issue #136

leestott avatar Sep 26 '25 10:09 leestott

@filipw Are you able to list NPU models on your Qualcomm device now?

natke avatar Sep 26 '25 14:09 natke

This is what I get right now (no models returned, using 0.7.117.26375)

PS C:\Users\filip> foundry model list
Exception: No models were returned from the Azure Foundry catalog.

OK, what if I downgrade?

PS C:\Users\filip> winget uninstall Microsoft.FoundryLocal
Found Foundry Local Model Server [Microsoft.FoundryLocal]
Starting package uninstall...
  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  100%
Successfully uninstalled
PS C:\Users\filip> winget install --id=Microsoft.FoundryLocal -v "0.6.87.59034" -e
Found Foundry Local [Microsoft.FoundryLocal] Version 0.6.87.59034
This application is licensed to you by its owner.
Microsoft is not responsible for, nor does it grant any licenses to, third-party packages.
This package requires the following dependencies:
  - Packages
      Microsoft.VCLibs.Desktop.14 [>= 14.0.33728.0]
Successfully verified installer hash
Starting package install...
  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  100%
Successfully installed

On 0.6.87.59034 it works:

PS C:\Users\filip> foundry model list
๐ŸŸข Service is Started on http://127.0.0.1:55651/, PID 21152!
Alias                          Device     Task               File Size    License      Model ID
-----------------------------------------------------------------------------------------------
phi-4                          CPU        chat-completion    10.16 GB     MIT          Phi-4-generic-cpu
--------------------------------------------------------------------------------------------------------
phi-3.5-mini                   CPU        chat-completion    2.53 GB      MIT          Phi-3.5-mini-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b                NPU        chat-completion    7.12 GB      MIT          deepseek-r1-distill-qwen-14b-qnn-npu
---------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b                 NPU        chat-completion    3.71 GB      MIT          deepseek-r1-distill-qwen-7b-qnn-npu
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k                CPU        chat-completion    2.54 GB      MIT          Phi-3-mini-128k-instruct-generic-cpu
---------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k                  CPU        chat-completion    2.53 GB      MIT          Phi-3-mini-4k-instruct-generic-cpu
-------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2                CPU        chat-completion    4.07 GB      apache-2.0   mistralai-Mistral-7B-Instruct-v0-2-generic-cpu
-------------------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning           NPU        chat-completion    2.78 GB      MIT          Phi-4-mini-reasoning-qnn-npu
                               CPU        chat-completion    4.52 GB      MIT          Phi-4-mini-reasoning-generic-cpu
-----------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b                   CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-0.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b                   CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-1.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b             CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-coder-0.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b               CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-coder-7b-instruct-generic-cpu
----------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b             CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-coder-1.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b                    CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-14b-instruct-generic-cpu
-----------------------------------------------------------------------------------------------------------------------
qwen2.5-7b                     CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-7b-instruct-generic-cpu
----------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b              CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-coder-14b-instruct-generic-cpu

OK, so let's upgrade to 0.7.117.26375 again

PS C:\Users\filip> winget install Microsoft.FoundryLocal
Found an existing package already installed. Trying to upgrade the installed package...
Found Foundry Local [Microsoft.FoundryLocal] Version 0.7.117.26375
This application is licensed to you by its owner.
Microsoft is not responsible for, nor does it grant any licenses to, third-party packages.
This package requires the following dependencies:
  - Packages
      Microsoft.VCLibs.Desktop.14 [>= 14.0.33728.0]
Successfully verified installer hash
Starting package install...
  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Œ   95%
Successfully installed. Restart the application to complete the upgrade.

Now model listing seems to work, but it's a bit suspicious, since the list is identical to the one before, and yesterday after installing 0.7.117.26375 I would get info that EP is being downloaded?

PS C:\Users\filip> foundry model list
Alias                          Device     Task               File Size    License      Model ID
-----------------------------------------------------------------------------------------------
phi-4                          CPU        chat-completion    10.16 GB     MIT          Phi-4-generic-cpu
--------------------------------------------------------------------------------------------------------
phi-3.5-mini                   CPU        chat-completion    2.53 GB      MIT          Phi-3.5-mini-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b                NPU        chat-completion    7.12 GB      MIT          deepseek-r1-distill-qwen-14b-qnn-npu
---------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b                 NPU        chat-completion    3.71 GB      MIT          deepseek-r1-distill-qwen-7b-qnn-npu
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k                CPU        chat-completion    2.54 GB      MIT          Phi-3-mini-128k-instruct-generic-cpu
---------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k                  CPU        chat-completion    2.53 GB      MIT          Phi-3-mini-4k-instruct-generic-cpu
-------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2                CPU        chat-completion    4.07 GB      apache-2.0   mistralai-Mistral-7B-Instruct-v0-2-generic-cpu
-------------------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning           NPU        chat-completion    2.78 GB      MIT          Phi-4-mini-reasoning-qnn-npu
                               CPU        chat-completion    4.52 GB      MIT          Phi-4-mini-reasoning-generic-cpu
-----------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b                   CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-0.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b                   CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-1.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b             CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-coder-0.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b               CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-coder-7b-instruct-generic-cpu
----------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b             CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-coder-1.5b-instruct-generic-cpu
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b                    CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-14b-instruct-generic-cpu
-----------------------------------------------------------------------------------------------------------------------
qwen2.5-7b                     CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-7b-instruct-generic-cpu
----------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b              CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-coder-14b-instruct-generic-cpu

Let's try to run a model - unfortunately it fails

PS C:\Users\filip> foundry model run deepseek-r1-7b
Downloading deepseek-r1-distill-qwen-7b-qnn-npu...
[####################################] 100.00 % [Time remaining: about 0s]        22.8 MB/s
๐Ÿ•› Loading model... [17:24:43 ERR] Failed loading model:deepseek-r1-distill-qwen-7b-qnn-npu
Exception: Failed: Loading model deepseek-r1-distill-qwen-7b-qnn-npu from http://127.0.0.1:55651/openai/load/deepseek-r1-distill-qwen-7b-qnn-npu?ttl=600
Internal Server Error
Failed loading model deepseek-r1-distill-qwen-7b-qnn-npu
Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_5289039109093371165_8_0'
PS C:\Users\filip>

filipw avatar Sep 26 '25 15:09 filipw

This is in the logs:

2025-09-26 17:29:57.773 +02:00 [INF] Starting Foundry Local CLI with 'model run deepseek-r1-7b'
2025-09-26 17:29:57.783 +02:00 [INF] Command:ServiceStart Status:Skipped Direct:False  Time:2ms
2025-09-26 17:29:58.748 +02:00 [INF] Model Phi-4-reasoning-generic-cpu does not have a valid prompt template.
2025-09-26 17:29:58.750 +02:00 [INF] Model deepseek-r1-distill-qwen-14b-generic-cpu is not supported on Arm64 currently.
2025-09-26 17:29:58.750 +02:00 [INF] Model deepseek-r1-distill-qwen-7b-generic-cpu is not supported on Arm64 currently.
2025-09-26 17:29:58.751 +02:00 [INF] Model Phi-4-mini-instruct-generic-cpu is not supported on Arm64 currently.
2025-09-26 17:29:58.751 +02:00 [INF] Total models fetched across all pages: 17
2025-09-26 17:29:58.781 +02:00 [INF] Command:ModelDownload Status:Skipped Direct:False  Time:997ms
2025-09-26 17:29:58.782 +02:00 [INF] Command:ServiceList Status:Success Direct:False  Time:0ms
2025-09-26 17:29:58.782 +02:00 [INF] Loading model: http://127.0.0.1:55651/openai/load/deepseek-r1-distill-qwen-7b-qnn-npu?ttl=600
2025-09-26 17:29:58.783 +02:00 [INF] Loading model:deepseek-r1-distill-qwen-7b-qnn-npu
2025-09-26 17:31:29.265 +02:00 [ERR] Failed loading model:deepseek-r1-distill-qwen-7b-qnn-npu
2025-09-26 17:31:29.265 +02:00 [INF] Command:ModelLoad Status:Failure Direct:False  Time:90483ms
2025-09-26 17:31:29.265 +02:00 [INF] Command:ModelRun Status:Failure Direct:True  Time:91484ms
2025-09-26 17:31:29.266 +02:00 [INF] Stream disconnected
2025-09-26 17:31:29.273 +02:00 [INF] LogException
Microsoft.AI.Foundry.Local.Common.FLException: Failed: Loading model deepseek-r1-distill-qwen-7b-qnn-npu from http://127.0.0.1:55651/openai/load/deepseek-r1-distill-qwen-7b-qnn-npu?ttl=600
 ---> System.Net.Http.HttpRequestException: Internal Server Error
Failed loading model deepseek-r1-distill-qwen-7b-qnn-npu
Could not find an implementation for EPContext(1) node with name 'QNNExecutionProvider_QNN_part0_5289039109093371165_8_0'
   at Microsoft.AI.Foundry.Local.Common.Utils.EnsureSuccessStatusCode(HttpResponseMessage, String, Func`2) + 0xe4
   --- End of inner exception stack trace ---
   at Microsoft.AI.Foundry.Local.Common.Utils.EnsureSuccessStatusCode(HttpResponseMessage, String, Func`2) + 0x140
   at Microsoft.AI.Foundry.Local.Common.ModelManagement.<LoadModelAsync>d__10.MoveNext() + 0x8b4
--- End of stack trace from previous location ---
   at Microsoft.AI.Foundry.Local.Commands.ModelRunCommand.<<Create>b__1_0>d.MoveNext() + 0x1000
--- End of stack trace from previous location ---
   at Microsoft.AI.Foundry.Local.Common.CommandActionFactory.<>c__DisplayClass0_0`1.<<Create>b__0>d.MoveNext() + 0x238
--- End of stack trace from previous location ---
   at System.CommandLine.NamingConventionBinder.CommandHandler.<GetExitCodeAsync>d__66.MoveNext() + 0x5c
--- End of stack trace from previous location ---
   at System.CommandLine.NamingConventionBinder.ModelBindingCommandHandler.<InvokeAsync>d__11.MoveNext() + 0x6c
--- End of stack trace from previous location ---
   at System.CommandLine.Invocation.InvocationPipeline.<InvokeAsync>d__0.MoveNext() + 0x1f4
--- End of stack trace from previous location ---
   at Microsoft.AI.Foundry.Local.Program.<Main>d__1.MoveNext() + 0x52c

filipw avatar Sep 26 '25 15:09 filipw

@filipw

Seems to work for me Processor Snapdragon(R) X Elite - X1E78100 - Qualcomm(R) Oryon(TM) CPU, 3417 Mhz, 12 Core(s), 12 Logical Processor(s) Installed Physical Memory (RAM) 32.0 GB

Loading personal and system profiles took 4233ms. ๎Š… foundry --version 0.7.117+67073234e7 ๎Š… foundry model run deepseek-r1-7b Model deepseek-r1-distill-qwen-7b-qnn-npu:1 was found in the local cache. ๐Ÿ•˜ Loading model... ๐ŸŸข Model deepseek-r1-distill-qwen-7b-qnn-npu:1 loaded successfully

Interactive Chat. Enter /? or /help for help. Press Ctrl+C to cancel generation. Type /exit to leave the chat.

Interactive mode, please enter your prompt

what is the capital of spain ๐Ÿง  Thinking... ๐Ÿค– Okay, so I need to figure out the capital of Spain. Hmm, I remember Spain is a country in Europe, right? I think it's in the Iberian Peninsula. I've heard of Madrid before, but I'm not 100% sure if that's the capital or just a big city there. Maybe it's Madrid? I think Barcelona is also pretty famous, but I don't think that's the capital. Let me try to recall any other capitals I know. For example, France's capital is Paris, Italy's is Rome, Germany's is Berlin. So Spain must have a capital too. Since I'm not sure about Madrid, maybe I should think about other Spanish cities. Oh, I think there's a city called Madrid that's a major city and a center of government. Yeah, that makes sense because I've heard it mentioned a lot, especially in news or travel. So I'm going to go with Madrid as the capital of Spain.

The capital of Spain is Madrid.

leestott avatar Sep 26 '25 16:09 leestott

@filipw, can you please try the following steps:

Get-AppxPackage -AllUsers "*.EP.*"  | Select-Object -ExpandProperty PackageFullName

If this shows an EP package is installed, remove it.

Now run this command (cleans up metadata associated with the downloaded package):

remove-appxpackage uup://product/Windows.Workload.ExecutionProvider.QNN.arm64

Run

foundry model list

again, to see the NPU models.

Run one of the models to make sure it is working.

natke avatar Sep 27 '25 16:09 natke

Thanks.

Running remove-appxpackage uup://product/Windows.Workload.ExecutionProvider.QNN.arm64 indeed appears to force a redownload of the NPU execution provider, but this does not seem to help - the NPU model are still not there

PS C:\Users\filip> foundry model list
๐ŸŸข Service is Started on http://127.0.0.1:60599/, PID 29020!
[####################################] 100.00 % [Time remaining: about 0s]       Downloading complete!

Alias                          Device     Task               File Size    License      Model ID
-----------------------------------------------------------------------------------------------
phi-4                          CPU        chat-completion    10.16 GB     MIT          Phi-4-generic-cpu:1
----------------------------------------------------------------------------------------------------------
phi-3.5-mini                   CPU        chat-completion    2.53 GB      MIT          Phi-3.5-mini-instruct-generic-cpu:1
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k                CPU        chat-completion    2.54 GB      MIT          Phi-3-mini-128k-instruct-generic-cpu:2
-----------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k                  CPU        chat-completion    2.53 GB      MIT          Phi-3-mini-4k-instruct-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2                CPU        chat-completion    4.07 GB      apache-2.0   mistralai-Mistral-7B-Instruct-v0-2-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b                CPU        chat-completion    11.51 GB     MIT          deepseek-r1-distill-qwen-14b-generic-cpu:3
---------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b                 CPU        chat-completion    6.43 GB      MIT          deepseek-r1-distill-qwen-7b-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b                   CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b                   CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b             CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-coder-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b               CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-coder-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b             CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-coder-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
phi-4-mini                     CPU        chat-completion    4.80 GB      MIT          Phi-4-mini-instruct-generic-cpu:4
------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning           CPU        chat-completion    4.52 GB      MIT          Phi-4-mini-reasoning-generic-cpu:2
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b                    CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-14b-instruct-generic-cpu:3
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-7b                     CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b              CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-coder-14b-instruct-generic-cpu:3

And Get-AppxPackage -AllUsers "*.EP.*" | Select-Object -ExpandProperty PackageFullName still returns empty list.

PS C:\Users\filip> Get-AppxPackage -AllUsers "*.EP.*"  | Select-Object -ExpandProperty PackageFullName
PS C:\Users\filip>

These are the logs

2025-09-28 09:32:53.901 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-28 09:32:54.172 +02:00 [INF] Timeout connecting to service
System.TimeoutException: The operation has timed out.
   at System.IO.Pipes.NamedPipeClientStream.ConnectInternal(Int32, CancellationToken, Int32) + 0xf8
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x44
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x78
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task&, Thread) + 0x6c
--- End of stack trace from previous location ---
   at Microsoft.Neutron.Rpc.Client.RpcSessionExtensions.<CreatePipeRpcSessionAsync>d__2`1.MoveNext() + 0x74
--- End of stack trace from previous location ---
   at Microsoft.AI.Foundry.Local.Common.ServiceManagement.<ConnectClientAsync>d__8.MoveNext() + 0x7c
--- End of stack trace from previous location ---
   at Microsoft.AI.Foundry.Local.Common.ServiceManagement.<CheckIsRunning>d__9.MoveNext() + 0x64
2025-09-28 09:32:54.172 +02:00 [INF] Starting service <C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.117.26375_arm64__8wekyb3d8bbwe\Inference.Service.Agent.exe --urls="http://127.0.0.1:0/" --OpenAIServiceSettings:ModelDirPath="C:\Users\filip\.foundry\cache\models" --JsonRpcServer:Run=true --JsonRpcServer:PipeName="inference_agent" --Logging:LogLevel:Default="Information">
2025-09-28 09:32:54.948 +02:00 [INF] Now listening on: http://127.0.0.1:60599
2025-09-28 09:32:54.948 +02:00 [INF] Application started. Press Ctrl+C to shut down.
2025-09-28 09:32:54.948 +02:00 [INF] Hosting environment: Production
2025-09-28 09:32:54.949 +02:00 [INF] Content root path: C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.117.26375_arm64__8wekyb3d8bbwe\
2025-09-28 09:32:54.949 +02:00 [INF] Service is started on http://127.0.0.1:60599/, PID 29020!
2025-09-28 09:32:54.955 +02:00 [INF] Downloading provider 1/1: QNNExecutionProvider
2025-09-28 09:32:54.983 +02:00 [INF] Loaded cached model info for 17 models. SavedAt:26.09.2025 17:19:25
2025-09-28 09:32:55.991 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:AutoRegisterCertifiedEps Status:Success Direct:True  Time:0ms
2025-09-28 09:34:10.871 +02:00 [INF] Registering provider 1/1: QNNExecutionProvider
2025-09-28 09:34:10.889 +02:00 [INF] Successfully autoregistered QNNExecutionProvider
2025-09-28 09:34:10.890 +02:00 [INF] Finished attempt to autoregister certified EPs at 28.09.2025 09:34:10; finished in 00:01:15.9651778
2025-09-28 09:34:10.892 +02:00 [INF] Successfully downloaded and registered the following EPs: QNNExecutionProvider.
Valid EPs: CPUExecutionProvider, QNNExecutionProvider

2025-09-28 09:34:10.892 +02:00 [INF] Command:ServiceAutoRegister Status:Success Direct:False  Time:74914ms
2025-09-28 09:34:10.897 +02:00 [INF] Valid devices: CPU
2025-09-28 09:34:10.901 +02:00 [INF] Valid EPs: CPUExecutionProvider, QNNExecutionProvider
2025-09-28 09:34:11.801 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-09-28 09:34:11.801 +02:00 [INF] Total models fetched across all pages: 18
2025-09-28 09:34:11.802 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True  Time:909ms
2025-09-28 09:34:11.809 +02:00 [INF] Command:ModelList Status:Success Direct:True  Time:77902ms
2025-09-28 09:34:11.809 +02:00 [INF] Stream disconnected
2025-09-28 09:35:23.172 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-28 09:35:23.735 +02:00 [INF] Command:ModelList Status:Success Direct:True  Time:555ms

I tried both under a user and admin powershell and there is no difference.

filipw avatar Sep 28 '25 07:09 filipw

Thank you for trying this. Can you try one more thing, which is to:

  • stop the foundry local service: foundry service stop
  • uninstall Foundry Local via Add or Remove Programs
  • repeat the above steps for removing the QNN EP and meta data
  • install Foundry Local again
  • try foundry model list again

natke avatar Sep 28 '25 16:09 natke

Thanks, I went through those exact steps, and the result is the same. It appears to download the QNN EP but then does not discover any NPU models.

PS C:\Users\filip> winget install Microsoft.FoundryLocal
Found Foundry Local [Microsoft.FoundryLocal] Version 0.7.117.26375
This application is licensed to you by its owner.
Microsoft is not responsible for, nor does it grant any licenses to, third-party packages.
This package requires the following dependencies:
  - Packages
      Microsoft.VCLibs.Desktop.14 [>= 14.0.33728.0]
Successfully verified installer hash
Starting package install...
  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  100%
Successfully installed
PS C:\Users\filip> foundry model list
๐ŸŸข Service is Started on http://127.0.0.1:54430/, PID 22064!
[####################################] 100.00 % [Time remaining: about 0s]       Downloading complete!

Alias                          Device     Task               File Size    License      Model ID
-----------------------------------------------------------------------------------------------
phi-4                          CPU        chat-completion    10.16 GB     MIT          Phi-4-generic-cpu:1
----------------------------------------------------------------------------------------------------------
phi-3.5-mini                   CPU        chat-completion    2.53 GB      MIT          Phi-3.5-mini-instruct-generic-cpu:1
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k                CPU        chat-completion    2.54 GB      MIT          Phi-3-mini-128k-instruct-generic-cpu:2
-----------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k                  CPU        chat-completion    2.53 GB      MIT          Phi-3-mini-4k-instruct-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2                CPU        chat-completion    4.07 GB      apache-2.0   mistralai-Mistral-7B-Instruct-v0-2-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b                CPU        chat-completion    11.51 GB     MIT          deepseek-r1-distill-qwen-14b-generic-cpu:3
---------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b                 CPU        chat-completion    6.43 GB      MIT          deepseek-r1-distill-qwen-7b-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b                   CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b                   CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b             CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-coder-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b               CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-coder-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b             CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-coder-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
phi-4-mini                     CPU        chat-completion    4.80 GB      MIT          Phi-4-mini-instruct-generic-cpu:4
------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning           CPU        chat-completion    4.52 GB      MIT          Phi-4-mini-reasoning-generic-cpu:2
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b                    CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-14b-instruct-generic-cpu:3
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-7b                     CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b              CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-coder-14b-instruct-generic-cpu:3
PS C:\Users\filip>

Logs:

2025-09-29 17:05:28.111 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-29 17:05:28.371 +02:00 [INF] Timeout connecting to service
System.TimeoutException: The operation has timed out.
   at System.IO.Pipes.NamedPipeClientStream.ConnectInternal(Int32, CancellationToken, Int32) + 0xf8
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x44
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x78
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task&, Thread) + 0x6c
--- End of stack trace from previous location ---
   at Microsoft.Neutron.Rpc.Client.RpcSessionExtensions.<CreatePipeRpcSessionAsync>d__2`1.MoveNext() + 0x74
--- End of stack trace from previous location ---
   at Microsoft.AI.Foundry.Local.Common.ServiceManagement.<ConnectClientAsync>d__8.MoveNext() + 0x7c
--- End of stack trace from previous location ---
   at Microsoft.AI.Foundry.Local.Common.ServiceManagement.<CheckIsRunning>d__9.MoveNext() + 0x64
2025-09-29 17:05:28.372 +02:00 [INF] Starting service <C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.117.26375_arm64__8wekyb3d8bbwe\Inference.Service.Agent.exe --urls="http://127.0.0.1:0/" --OpenAIServiceSettings:ModelDirPath="C:\Users\filip\.foundry\cache\models" --JsonRpcServer:Run=true --JsonRpcServer:PipeName="inference_agent" --Logging:LogLevel:Default="Information">
2025-09-29 17:05:28.706 +02:00 [INF] Service is started on http://127.0.0.1:54430/, PID 22064!
2025-09-29 17:05:28.745 +02:00 [INF] Loaded cached model info for 18 models. SavedAt:28.09.2025 09:34:11
2025-09-29 17:05:29.776 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:AutoRegisterCertifiedEps Status:Success Direct:True  Time:23ms
2025-09-29 17:06:14.725 +02:00 [INF] Registering provider 1/1: QNNExecutionProvider
2025-09-29 17:06:14.740 +02:00 [INF] Successfully autoregistered QNNExecutionProvider
2025-09-29 17:06:14.741 +02:00 [INF] Finished attempt to autoregister certified EPs at 29.09.2025 17:06:14; finished in 00:00:46.0722644
2025-09-29 17:06:14.757 +02:00 [INF] Successfully downloaded and registered the following EPs: QNNExecutionProvider.
Valid EPs: CPUExecutionProvider, QNNExecutionProvider

2025-09-29 17:06:14.760 +02:00 [INF] Command:ServiceAutoRegister Status:Success Direct:False  Time:45032ms
2025-09-29 17:06:14.766 +02:00 [INF] Valid devices: CPU
2025-09-29 17:06:14.767 +02:00 [INF] Valid EPs: CPUExecutionProvider, QNNExecutionProvider
2025-09-29 17:06:15.937 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-09-29 17:06:15.942 +02:00 [INF] Total models fetched across all pages: 18
2025-09-29 17:06:15.944 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True  Time:1178ms
2025-09-29 17:06:15.951 +02:00 [INF] Command:ModelList Status:Success Direct:True  Time:47835ms
2025-09-29 17:06:15.954 +02:00 [INF] Stream disconnected
2025-09-29 17:06:31.227 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-29 17:06:31.928 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-09-29 17:06:31.929 +02:00 [INF] Total models fetched across all pages: 18
2025-09-29 17:06:31.929 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True  Time:663ms
2025-09-29 17:06:31.935 +02:00 [INF] Command:ModelList Status:Success Direct:True  Time:702ms
2025-09-29 17:06:31.937 +02:00 [INF] Stream disconnected
2025-09-29 17:06:32.969 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-29 17:06:33.373 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-09-29 17:06:33.374 +02:00 [INF] Total models fetched across all pages: 18
2025-09-29 17:06:33.374 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True  Time:365ms
2025-09-29 17:06:33.381 +02:00 [INF] Command:ModelList Status:Success Direct:True  Time:406ms
2025-09-29 17:06:33.385 +02:00 [INF] Stream disconnected
2025-09-29 17:06:34.383 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-29 17:06:34.785 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-09-29 17:06:34.786 +02:00 [INF] Total models fetched across all pages: 18
2025-09-29 17:06:34.786 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True  Time:364ms
2025-09-29 17:06:34.790 +02:00 [INF] Command:ModelList Status:Success Direct:True  Time:401ms
2025-09-29 17:06:34.791 +02:00 [INF] Stream disconnected
2025-09-29 17:06:37.338 +02:00 [INF] Starting Foundry Local CLI with 'model list'
2025-09-29 17:06:38.402 +02:00 [INF] Command:ServiceAutoRegister Status:Success Direct:False  Time:8ms
2025-09-29 17:06:38.404 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:AutoRegisterCertifiedEps Status:Success Direct:True  Time:0ms
2025-09-29 17:06:39.210 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-09-29 17:06:39.211 +02:00 [INF] Total models fetched across all pages: 18
2025-09-29 17:06:39.211 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.117+67073234e7 Command:ModelList Status:Success Direct:True  Time:1829ms
2025-09-29 17:06:39.218 +02:00 [INF] Command:ModelList Status:Success Direct:True  Time:1873ms
2025-09-29 17:06:39.219 +02:00 [INF] Stream disconnected

Get-AppxPackage -AllUsers "*.EP.*" | Select-Object -ExpandProperty PackageFullName still finds nothing.

filipw avatar Sep 29 '25 15:09 filipw

Thank you for trying this! We have been working on improving the EP and device discovery and should have a patch out today

natke avatar Sep 29 '25 17:09 natke

Hi I tried on 0.7.120 and it still fails with NPU models, though it's a little more verbose saying that the QNN EP cannot be registered:

PS C:\Users\filip> foundry --version
0.7.120+3b92ed4014
PS C:\Users\filip> foundry model list
๐ŸŸข Service is Started on http://127.0.0.1:54613/, PID 23884!
๐Ÿ• Downloading complete!...
Failed to download or register the following EPs: QNNExecutionProvider. Will try installing again later.
Valid EPs: CPUExecutionProvider
Alias                          Device     Task               File Size    License      Model ID
-----------------------------------------------------------------------------------------------
phi-4                          CPU        chat-completion    10.16 GB     MIT          Phi-4-generic-cpu:1
----------------------------------------------------------------------------------------------------------
phi-3.5-mini                   CPU        chat-completion    2.53 GB      MIT          Phi-3.5-mini-instruct-generic-cpu:1
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k                CPU        chat-completion    2.54 GB      MIT          Phi-3-mini-128k-instruct-generic-cpu:2
-----------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k                  CPU        chat-completion    2.53 GB      MIT          Phi-3-mini-4k-instruct-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2                CPU        chat-completion    4.07 GB      apache-2.0   mistralai-Mistral-7B-Instruct-v0-2-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b                CPU        chat-completion    11.51 GB     MIT          deepseek-r1-distill-qwen-14b-generic-cpu:3
---------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b                 CPU        chat-completion    6.43 GB      MIT          deepseek-r1-distill-qwen-7b-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b                   CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b                   CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b             CPU        chat-completion    0.80 GB      apache-2.0   qwen2.5-coder-0.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b               CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-coder-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b             CPU        chat-completion    1.78 GB      apache-2.0   qwen2.5-coder-1.5b-instruct-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
phi-4-mini                     CPU        chat-completion    4.80 GB      MIT          Phi-4-mini-instruct-generic-cpu:4
------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning           CPU        chat-completion    4.52 GB      MIT          Phi-4-mini-reasoning-generic-cpu:2
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b                    CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-14b-instruct-generic-cpu:3
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-7b                     CPU        chat-completion    6.16 GB      apache-2.0   qwen2.5-7b-instruct-generic-cpu:3
------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b              CPU        chat-completion    11.06 GB     apache-2.0   qwen2.5-coder-14b-instruct-generic-cpu:3
PS C:\Users\filip>

The logs:

2025-10-06 07:29:00.960 +02:00 [INF] Now listening on: http://127.0.0.1:54613
2025-10-06 07:29:00.961 +02:00 [INF] Application started. Press Ctrl+C to shut down.
2025-10-06 07:29:00.961 +02:00 [INF] Hosting environment: Production
2025-10-06 07:29:00.963 +02:00 [INF] Content root path: C:\Program Files\WindowsApps\Microsoft.FoundryLocal_0.7.120.15250_arm64__8wekyb3d8bbwe\
2025-10-06 07:29:00.964 +02:00 [INF] Provider 1/1: QNNExecutionProvider is in NotPresent state.Downloading and ensuring provider is ready.
2025-10-06 07:29:00.964 +02:00 [INF] Found service endpoints: http://127.0.0.1:54613
2025-10-06 07:29:00.964 +02:00 [INF] Service is started on http://127.0.0.1:54613/, PID 23884!
2025-10-06 07:29:00.964 +02:00 [INF] Command:ModelInit Status:Success Direct:True  Time:1255ms
2025-10-06 07:29:00.965 +02:00 [INF] Command:ServiceStart Status:Failure Direct:False  Time:1256ms
2025-10-06 07:29:00.965 +02:00 [INF] Checking EP autoregistration status...
2025-10-06 07:29:01.031 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.120+3b92ed4014 Command:ServiceStatus Status:Success Direct:True Time:2ms
2025-10-06 07:29:01.033 +02:00 [INF] Processing EP autoregistration status...
2025-10-06 07:29:01.033 +02:00 [INF] Command:ServiceStatus Status:Success Direct:False  Time:68ms
2025-10-06 07:29:01.040 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.120+3b92ed4014 Command:AutoRegisterCertifiedEps Status:Success Direct:True Time:1ms
2025-10-06 07:29:01.040 +02:00 [INF] Reporting progress of running EP download
2025-10-06 07:29:33.305 +02:00 [INF] Provider QNNExecutionProvider download and ensuring ready attempt: Failure
2025-10-06 07:29:33.315 +02:00 [INF] Download attempt for QNNExecutionProvider unsuccessful. Skipping all EP downloads. 
2025-10-06 07:29:33.316 +02:00 [INF] Finished attempt to autoregister certified EPs; finished in 32443ms
2025-10-06 07:29:33.321 +02:00 [INF] Failed to download or register the following EPs: QNNExecutionProvider. Will try installing again later.
Valid EPs: CPUExecutionProvider
2025-10-06 07:29:33.371 +02:00 [INF] Command:ServiceAutoRegister Status:Failure Direct:False  Time:32337ms
2025-10-06 07:29:33.412 +02:00 [INF] Loaded cached model info for 18 models. SavedAt:06.10.2025 07:26:26
2025-10-06 07:29:33.419 +02:00 [INF] Creating new task to ensure and autoregister certified execution providers
2025-10-06 07:29:33.421 +02:00 [INF] Created task to ensure and autoregister certified execution providers
2025-10-06 07:29:33.421 +02:00 [INF] Attempt 2: Autoregistration of certified execution providers in progress.
2025-10-06 07:29:33.421 +02:00 [INF] Started autoregistering certified EPs
2025-10-06 07:29:33.427 +02:00 [INF] Valid devices: CPU
2025-10-06 07:29:33.432 +02:00 [INF] Valid EPs: CPUExecutionProvider, QNNExecutionProvider
2025-10-06 07:29:33.442 +02:00 [INF] Provider 1/1: QNNExecutionProvider is in NotPresent state.Downloading and ensuring provider is ready.
2025-10-06 07:29:34.432 +02:00 [INF] Model Phi-4-reasoning-generic-cpu:1 does not have a valid prompt template.
2025-10-06 07:29:34.432 +02:00 [INF] Total models fetched across all pages: 18
2025-10-06 07:29:34.433 +02:00 [INF] UserAgent:foundry-local-CLI/0.7.120+3b92ed4014 Command:ModelList Status:Success Direct:True Time:1015ms
2025-10-06 07:29:34.438 +02:00 [INF] Command:ModelInit Status:Success Direct:True  Time:1255ms
2025-10-06 07:29:34.440 +02:00 [INF] Command:ModelList Status:Success Direct:True  Time:34736ms
2025-10-06 07:29:34.442 +02:00 [INF] Stream disconnected

filipw avatar Oct 06 '25 05:10 filipw

I traced the requests with Fiddler and this is what I see:

Request:

POST https://ai.azure.com/api/eastus/ux/v1.0/entities/crossRegion HTTP/1.1
Host: ai.azure.com
User-Agent: AzureAiStudio
traceparent: 00-ad4eb3d0ebf3896227d0a0c178755093-3eea31b2997522ca-00
Content-Type: application/json; charset=utf-8
Content-Length: 1173

{
  "resourceIds": [
    {
      "resourceId": "azureml",
      "entityContainerType": "Registry"
    }
  ],
  "indexEntitiesRequest": {
    "filters": [
      {
        "field": "type",
        "operator": "eq",
        "values": [
          "models"
        ]
      },
      {
        "field": "kind",
        "operator": "eq",
        "values": [
          "Versioned"
        ]
      },
      {
        "field": "labels",
        "operator": "eq",
        "values": [
          "latest"
        ]
      },
      {
        "field": "annotations/tags/foundryLocal",
        "operator": "eq",
        "values": [
          "",
          "test"
        ]
      },
      {
        "field": "properties/variantInfo/variantMetadata/device",
        "operator": "eq",
        "values": [
          "CPU"
        ]
      },
      {
        "field": "properties/variantInfo/variantMetadata/executionProvider",
        "operator": "eq",
        "values": [
          "CPUExecutionProvider",
          "QNNExecutionProvider"
        ]
      }
    ],
    "pageSize": 50,
    "skip": null,
    "continuationToken": null
  }
}

Response:

HTTP/1.1 200 OK
Date: Tue, 07 Oct 2025 15:18:24 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 84801
Connection: keep-alive
Vary: Accept-Encoding
Request-Context: appId=cid-v1:2d2e8e63-272e-4b3c-8598-4ee570a0e70d
x-ms-response-type: standard
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Content-Type-Options: nosniff
azureml-served-by-cluster: vienna-eastus-02
x-request-time: 0.117
x-azure-ref: 20251007T151824Z-164558f69d6cmmnwhC1ZRHr0ps00000004r00000000099ks
X-Cache: CONFIG_NOCACHE
Accept-Ranges: bytes

{"indexEntitiesResponse":{"totalCount":null,"value":[{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-4-generic-cpu/version/1","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-4","author":"Microsoft","directoryPath":"cpu-int4-rtn-block-32-acc-level-4","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/phi-4/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|system|>\\n{Content}<|im_end|>\", \"user\": \"<|user|>\\n{Content}<|im_end|>\", \"assistant\": \"<|assistant|>\\n{Content}<|im_end|>\", \"prompt\": \"<|user|>\\n{Content}<|im_end|>\\n<|assistant|>\"}","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-4-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-4 to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-4 for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-4](https://huggingface.co/microsoft/Phi-4) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-4-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-07T17:43:28.8799+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-4-generic-cpu:1","name":"Phi-4-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":1,"alphanumericVersion":"1","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-07T17:43:28.4189955Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-4/versions/7"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":10909216931,"vRamFootprintBytes":10909502914}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":20,"type":"models","version":"1","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-4-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-4-generic-cpu/versions/1","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-3.5-mini-instruct-generic-cpu/version/1","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-3.5-mini","author":"Microsoft","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/Phi-3.5-mini-instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-3.5-mini-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-3.5-mini-instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-3.5-mini-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-07T19:23:10.1295265+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-3.5-mini-instruct-generic-cpu:1","name":"Phi-3.5-mini-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":1,"alphanumericVersion":"1","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-07T19:23:09.7140407Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-3.5-mini-instruct/versions/6"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":2716566814,"vRamFootprintBytes":2768702013}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"1","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-3.5-mini-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-3.5-mini-instruct-generic-cpu/versions/1","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-4-reasoning-generic-cpu/version/1","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-4-reasoning","author":"Microsoft","directoryPath":"v1","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/phi-4/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-4-reasoning-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-4-reasoning to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-4-reasoning for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-4-reasoning](https://huggingface.co/microsoft/Phi-4-reasoning) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-4-reasoning-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-11T01:18:23.6244346+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-4-reasoning-generic-cpu:1","name":"Phi-4-reasoning-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":1,"alphanumericVersion":"1","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-11T01:18:23.1946551Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-4-reasoning/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":10909216931,"vRamFootprintBytes":10909483885}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"1","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-4-reasoning-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-4-reasoning-generic-cpu/versions/1","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-3-mini-128k-instruct-generic-cpu/version/2","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-3-mini-128k","author":"Microsoft","directoryPath":"cpu-int4-rtn-block-32-acc-level-4","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|system|>\\n{Content}<|end|>\", \"user\": \"<|user|>\\n{Content}<|end|>\", \"assistant\": \"<|assistant|>\\n{Content}<|end|>\", \"prompt\": \"<|user|>\\n{Content}<|end|>\\n<|assistant|>\"}","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-3-mini-128k-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-3-Mini-128K-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-3-Mini-128K-Instruct](https://huggingface.co/microsoft/Phi-3-Mini-128K-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-3-mini-128k-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-12T22:43:58.0656724+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-3-mini-128k-instruct-generic-cpu:2","name":"Phi-3-mini-128k-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":2,"alphanumericVersion":"2","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-12T22:43:57.6918486Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-3-mini-128k-instruct/versions/13"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":2727304232,"vRamFootprintBytes":2727517409}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"2","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-3-mini-128k-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-3-mini-128k-instruct-generic-cpu/versions/2","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-3-mini-4k-instruct-generic-cpu/version/2","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-3-mini-4k","author":"Microsoft","directoryPath":"cpu-int4-rtn-block-32-acc-level-4","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|system|>\\n{Content}<|end|>\", \"user\": \"<|user|>\\n{Content}<|end|>\", \"assistant\": \"<|assistant|>\\n{Content}<|end|>\", \"prompt\": \"<|user|>\\n{Content}<|end|>\\n<|assistant|>\"}","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-3-mini-4k-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-3-Mini-4K-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-3-Mini-4K-Instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-3-mini-4k-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-12T22:59:32.8315613+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-3-mini-4k-instruct-generic-cpu:2","name":"Phi-3-mini-4k-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":2,"alphanumericVersion":"2","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-12T22:59:32.2182163Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/15"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":2716566814,"vRamFootprintBytes":3237709086}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"2","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-3-mini-4k-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-3-mini-4k-instruct-generic-cpu/versions/2","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/mistralai-Mistral-7B-Instruct-v0-2-generic-cpu/version/2","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"mistral-7b-v0.2","author":"Microsoft","directoryPath":"mistral-7b-instruct-v0.2-cpu-int4-rtn-block-32-acc-level-4","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://www.apache.org/licenses/LICENSE-2.0.html>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<s>\", \"user\": \"[INST]\\n{Content}\\n[/INST]\", \"assistant\": \"{Content}</s>\", \"prompt\": \"[INST]\\n{Content}\\n[/INST]\"}","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"mistralai-Mistral-7B-Instruct-v0-2-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** apache-2.0\n- **License:** MIT\n- **Model Description:** This is a conversion of the Mistral-7B-Instruct-v0.2 for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) for details.\n","labels":["default","latest","invisibleLatest"],"name":"mistralai-Mistral-7B-Instruct-v0-2-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-12T23:21:21.9132405+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"mistralai-Mistral-7B-Instruct-v0-2-generic-cpu:2","name":"mistralai-Mistral-7B-Instruct-v0-2-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":2,"alphanumericVersion":"2","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-12T23:21:21.2397546Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/mistralai-Mistral-7B-Instruct-v0-2/versions/6"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":4370129223,"vRamFootprintBytes":4491341762}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"2","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"mistralai-Mistral-7B-Instruct-v0-2-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/mistralai-Mistral-7B-Instruct-v0-2-generic-cpu/versions/2","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/deepseek-r1-distill-qwen-14b-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"deepseek-r1-14b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"assistant\": \"{Content}\", \"prompt\": \"\\\\u003C\\\\uFF5CUser\\\\uFF5C\\\\u003E{Content}\\\\u003C\\\\uFF5CAssistant\\\\uFF5C\\\\u003E\"}","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"deepseek-r1-distill-qwen-14b-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-14B for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) for details.\n","labels":["default","latest","invisibleLatest"],"name":"deepseek-r1-distill-qwen-14b-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-14T21:37:41.6801451+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"deepseek-r1-distill-qwen-14b-generic-cpu:3","name":"deepseek-r1-distill-qwen-14b-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-14T21:37:40.8451674Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/deepseek-r1-distill-qwen-14b/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":12358768394,"vRamFootprintBytes":12359149537}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"deepseek-r1-distill-qwen-14b-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/deepseek-r1-distill-qwen-14b-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/deepseek-r1-distill-qwen-7b-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"deepseek-r1-7b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"assistant\": \"{Content}\", \"prompt\": \"\\\\u003C\\\\uFF5CUser\\\\uFF5C\\\\u003E{Content}\\\\u003C\\\\uFF5CAssistant\\\\uFF5C\\\\u003E\"}","task":"chat-completion"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"deepseek-r1-distill-qwen-7b-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) for details.\n","labels":["default","latest","invisibleLatest"],"name":"deepseek-r1-distill-qwen-7b-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-05-14T22:13:57.0519582+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"deepseek-r1-distill-qwen-7b-generic-cpu:3","name":"deepseek-r1-distill-qwen-7b-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-05-14T22:13:56.2929352Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/deepseek-r1-distill-qwen-7b/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":6904159928,"vRamFootprintBytes":6904383723}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"deepseek-r1-distill-qwen-7b-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/deepseek-r1-distill-qwen-7b-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-0.5b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-0.5b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","parameterSchema":"{\"enabled\": [{\"name\": \"temperature\", \"default\": 0.7}, {\"name\": \"top_p\", \"default\": 0.8}, {\"name\": \"top_k\", \"default\": 40}, {\"name\": \"presence_penalty\", \"default\": 1.1}]}","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-0.5b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-0.5B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-0.5b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-03T23:59:50.9468509+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-0.5b-instruct-generic-cpu:3","name":"qwen2.5-0.5b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-03T23:59:50.6496731Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-0.5b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":861939957,"vRamFootprintBytes":862107904}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-0.5b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-0.5b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-1.5b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-1.5b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-1.5b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-1.5B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-1.5b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T00:21:44.3045772+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-1.5b-instruct-generic-cpu:3","name":"qwen2.5-1.5b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T00:21:43.7807596Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-1.5b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":1911260446,"vRamFootprintBytes":1911456583}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-1.5b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-1.5b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-coder-0.5b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-coder-0.5b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","parameterSchema":"{\"enabled\": [{\"name\": \"temperature\", \"default\": 1.0}, {\"name\": \"top_p\", \"default\": 0.9}, {\"name\": \"top_k\", \"default\": 40}, {\"name\": \"presence_penalty\", \"default\": 1.1}]}","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-coder-0.5b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-Coder-0.5B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-coder-0.5b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T00:31:40.6393493+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-coder-0.5b-instruct-generic-cpu:3","name":"qwen2.5-coder-0.5b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T00:31:40.2585336Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-coder-0.5b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":861939957,"vRamFootprintBytes":1035133255}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-coder-0.5b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-coder-0.5b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-coder-7b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-coder-7b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-coder-7b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-Coder-7B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-coder-7b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T00:41:58.7940727+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-coder-7b-instruct-generic-cpu:3","name":"qwen2.5-coder-7b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T00:41:58.3829876Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-coder-7b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":6614249635,"vRamFootprintBytes":6614587351}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-coder-7b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-coder-7b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-coder-1.5b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-coder-1.5b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-coder-1.5b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-Coder-1.5B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-coder-1.5b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T00:52:43.5436122+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-coder-1.5b-instruct-generic-cpu:3","name":"qwen2.5-coder-1.5b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T00:52:43.1677432Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-coder-1.5b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":1911260446,"vRamFootprintBytes":1911457904}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-coder-1.5b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-coder-1.5b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-4-mini-instruct-generic-cpu/version/4","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-4-mini","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/Phi-4-mini-instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|system|>{Content}<|end|>\", \"user\": \"<|user|>{Content}<|end|>\", \"assistant\": \"<|assistant|>{Content}<|end|>\", \"prompt\": \"<|user|>{Content}<|end|><|assistant|>\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"<|/tool_call|>","toolCallStart":"<|tool_call|>","toolRegisterEnd":"<|/tool|>","toolRegisterStart":"<|tool|>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-4-mini-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-4-mini-instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-4-mini-instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-4-mini-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T16:13:47.122296+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-4-mini-instruct-generic-cpu:4","name":"Phi-4-mini-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":4,"alphanumericVersion":"4","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T16:13:45.9261209Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-4-mini-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":5153960755,"vRamFootprintBytes":5206105439}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"4","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-4-mini-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-4-mini-instruct-generic-cpu/versions/4","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/Phi-4-mini-reasoning-generic-cpu/version/2","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"phi-4-mini-reasoning","author":"Microsoft","directoryPath":"v1","foundryLocal":"","inputModalities":"text","license":"MIT","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/microsoft/Phi-4-mini-reasoning/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|system|>Your name is Phi, an AI math expert developed by Microsoft. {Content}<|end|>\", \"user\": \"<|user|>{Content}<|end|>\", \"assistant\": \"<|assistant|>{Content}<|end|>\", \"prompt\": \"<|user|>{Content}<|end|><|assistant|>\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"<|/tool_call|>","toolCallStart":"<|tool_call|>","toolRegisterEnd":"<|/tool|>","toolRegisterStart":"<|tool|>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"MIT","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"Phi-4-mini-reasoning-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Phi-4-mini-reasoning to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** MIT\n- **Model Description:** This is a conversion of the Phi-4-mini-reasoning for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Phi-4-mini-reasoning](https://huggingface.co/microsoft/Phi-4-mini-reasoning) for details.","labels":["default","latest","invisibleLatest"],"name":"Phi-4-mini-reasoning-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T17:05:14.0779428+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"Phi-4-mini-reasoning-generic-cpu:2","name":"Phi-4-mini-reasoning-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":2,"alphanumericVersion":"2","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T17:05:13.7904476Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/Phi-4-mini-reasoning/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":4853313044,"vRamFootprintBytes":4905427271}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"2","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"Phi-4-mini-reasoning-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/Phi-4-mini-reasoning-generic-cpu/versions/2","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-14b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-14b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-14B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-14b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-14B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-14b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T19:51:51.1044735+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-14b-instruct-generic-cpu:3","name":"qwen2.5-14b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T19:51:50.1856085Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-14b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":11875584573,"vRamFootprintBytes":11875920035}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-14b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-14b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-7b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-7b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-7b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-7B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-7b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T20:57:31.7803387+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-7b-instruct-generic-cpu:3","name":"qwen2.5-7b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T20:57:31.1457087Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-7b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":6614249635,"vRamFootprintBytes":6614446407}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-7b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-7b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/qwen2.5-coder-14b-instruct-generic-cpu/version/3","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":true,"tags":{"alias":"qwen2.5-coder-14b","author":"Microsoft","directoryPath":"v3","foundryLocal":"","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct/blob/main/LICENSE>.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}","supportsToolCalling":"","task":"chat-completion","toolCallEnd":"</tool_call>","toolCallStart":"<tool_call>","toolRegisterEnd":"</tools>","toolRegisterStart":"<tools>"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["chat-completion"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"qwen2.5-coder-14b-instruct-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** Microsoft\n- **Model type:** ONNX\n- **License:** apache-2.0\n- **Model Description:** This is a conversion of the Qwen2.5-Coder-14B-Instruct for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct) for details.","labels":["default","latest","invisibleLatest"],"name":"qwen2.5-coder-14b-instruct-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-06-04T21:00:59.5956532+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"qwen2.5-coder-14b-instruct-generic-cpu:3","name":"qwen2.5-coder-14b-instruct-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":3,"alphanumericVersion":"3","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-06-04T21:00:58.9017408Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azureml/models/qwen2.5-coder-14b-instruct/versions/1"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":11875584573,"vRamFootprintBytes":11875922365}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"3","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"qwen2.5-coder-14b-instruct-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/qwen2.5-coder-14b-instruct-generic-cpu/versions/3","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null},{"relevancyScore":1.0,"entityResourceName":"azureml","highlights":{},"schemaId":"43f072fa-9b7f-571f-bdf3-160514d1ff70","entityId":"azureml://registries/b98c40bc-36d6-4175-b131-bad4502fe1a3/type/models/objectId/openai-whisper-tiny-generic-cpu/version/1","kind":"Versioned","annotations":{"invisibleUntil":"0001-01-01T00:00:00+00:00","archived":false,"tags":{"alias":"whisper-tiny","author":"Microsoft","directoryPath":"openai-whisper-tiny-generic-cpu","foundryLocal":"test","inputModalities":"text","license":"apache-2.0","licenseDescription":"This model is provided under the License Terms available at https://www.apache.org/licenses/LICENSE-2.0.html.","maxOutputTokens":"2048","outputModalities":"text","promptTemplate":"{\"prompt\": \"<|startoftranscript|> <|en|> <|transcribe|> <|notimestamps|>\"}","task":"automatic speech recognition"},"datasets":[],"sampleInputData":null,"sampleOutputData":null,"resourceRequirements":null,"stage":"Development","systemCatalogData":{"publisher":"Microsoft","modelCapabilities":null,"deploymentTypes":["maap-inference","batch-enabled"],"license":"apache-2.0","inferenceTasks":["automatic speech recognition"],"fineTuningTasks":null,"inferenceComputeAllowed":null,"fineTuneComputeAllowed":null,"evaluationComputeAllowed":null,"languages":[],"summary":null,"displayName":"openai-whisper-tiny-generic-cpu","textContextWindow":null,"maxOutputTokens":2048,"inputModalities":["text"],"outputModalities":["text"],"playgroundRateLimitTier":null,"azureOffers":["VM"]},"description":"Whisper is an OpenAI pre-trained speech recognition model with potential applications for ASR solutions for developers. However, due to weak supervision and large-scale noisy data, it should be used with caution in high-risk domains. The model has been trained on 680k hours of audio data representing 98 different languages, leading to improved robustness and accuracy compared to existing ASR systems. However, there are disparities in performance across languages and the model is prone to generating repetitive texts, which may increase in low-resource languages. There are dual-use concerns and real economic implications with such performance disparities, and the model may also have the capacity to recognize specific individuals. The affordable cost of automatic transcription and translation of large volumes of audio communication is a potential benefit, but the cost of transcription may limit the expansion of surveillance projects.\n\nThe tiny model is the smallest variant in the Whisper family, offering faster inference times with reduced accuracy compared to larger models, making it suitable for resource-constrained environments and real-time applications where speed is prioritized over precision.\n\n> The above summary was generated using ChatGPT. Review the <a href=\"https://huggingface.co/openai/whisper-tiny\" target=\"_blank\">original model card</a> to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model.\n\nThis model is an optimized version of OpenAI-whisper-tiny to enable local inference on CPUs. This model uses RTN quantization.\n\n# Model Description\n- **Developed by:** OpenAI\n- **Model type:** apache-2.0\n- **License:** Apache license 2.0\n- **Model Description:** This is a conversion of the OpenAI-whisper-tiny for local inference on CPUs.\n- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.\n\n# Base Model Information\nSee Hugging Face model [OpenAI-whisper-tiny](https://huggingface.co/openai/whisper-tiny) for details.\n","labels":["default","latest","invisibleLatest"],"name":"openai-whisper-tiny-generic-cpu"},"properties":{"updatedTime":"0001-01-01T00:00:00+00:00","creationContext":{"createdTime":"2025-09-10T20:06:28.28651+00:00","createdBy":{"userObjectId":"00000000-0000-0000-0000-000000000000","userTenantId":"00000000-0000-0000-0000-000000000000","userName":"azureml","userPrincipalName":null},"creationSource":null},"id":"openai-whisper-tiny-generic-cpu:1","name":"openai-whisper-tiny-generic-cpu","modelFramework":"Custom","modelFrameworkVersion":null,"modelFormat":"CUSTOM","version":1,"alphanumericVersion":"1","url":null,"mimeType":"application/octet-stream","modifiedTime":"2025-09-11T18:52:39.7230022Z","unpack":false,"parentModelId":null,"runId":null,"experimentName":null,"derivedModelIds":null,"userProperties":{},"isAnonymous":false,"orginAssetId":null,"intellectualProperty":{"publisher":null},"variantInfo":{"parents":[{"assetId":"azureml://registries/azure-openai/models/whisper/versions/001"}],"variantMetadata":{"modelType":"ONNX","quantization":["RTN"],"device":"cpu","executionProvider":"CPUExecutionProvider","fileSizeBytes":193167360,"vRamFootprintBytes":193392097}},"provisioningState":"Succeeded"},"internal":{},"updateSequence":8,"type":"models","version":"1","entityContainerId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","entityObjectId":"openai-whisper-tiny-generic-cpu","resourceType":"Registry","relationships":[],"assetId":"azureml://registries/azureml/models/openai-whisper-tiny-generic-cpu/versions/1","usage":{"totalCount":0,"popularity":1.0},"isAFragment":false,"fragmentId":null}],"nextSkip":null,"continuationToken":null,"entityContainerIdsToEntityContainerMetadata":{"azureml":{"resourceId":"azureml","subscriptionId":"6c6683e9-e5fe-4038-8519-ce6ebec2ba15","resourceGroup":"registry-builtin-prod-eastus-01","resourceName":"azureml","entityContainerType":"Registry","regions":[{"regionName":"eastus","isPrimaryRegion":true},{"regionName":"australiaeast","isPrimaryRegion":false},{"regionName":"australiasoutheast","isPrimaryRegion":false},{"regionName":"brazilsouth","isPrimaryRegion":false},{"regionName":"canadacentral","isPrimaryRegion":false},{"regionName":"canadaeast","isPrimaryRegion":false},{"regionName":"centralindia","isPrimaryRegion":false},{"regionName":"centralus","isPrimaryRegion":false},{"regionName":"eastasia","isPrimaryRegion":false},{"regionName":"eastus2","isPrimaryRegion":false},{"regionName":"francecentral","isPrimaryRegion":false},{"regionName":"germanywestcentral","isPrimaryRegion":false},{"regionName":"japaneast","isPrimaryRegion":false},{"regionName":"japanwest","isPrimaryRegion":false},{"regionName":"jioindiawest","isPrimaryRegion":false},{"regionName":"koreacentral","isPrimaryRegion":false},{"regionName":"northcentralus","isPrimaryRegion":false},{"regionName":"northeurope","isPrimaryRegion":false},{"regionName":"norwayeast","isPrimaryRegion":false},{"regionName":"southafricanorth","isPrimaryRegion":false},{"regionName":"southcentralus","isPrimaryRegion":false},{"regionName":"southeastasia","isPrimaryRegion":false},{"regionName":"swedencentral","isPrimaryRegion":false},{"regionName":"switzerlandnorth","isPrimaryRegion":false},{"regionName":"uaenorth","isPrimaryRegion":false},{"regionName":"uksouth","isPrimaryRegion":false},{"regionName":"ukwest","isPrimaryRegion":false},{"regionName":"westcentralus","isPrimaryRegion":false},{"regionName":"westeurope","isPrimaryRegion":false},{"regionName":"westus","isPrimaryRegion":false},{"regionName":"westus2","isPrimaryRegion":false},{"regionName":"westus3","isPrimaryRegion":false},{"regionName":"qatarcentral","isPrimaryRegion":false},{"regionName":"polandcentral","isPrimaryRegion":false},{"regionName":"southindia","isPrimaryRegion":false},{"regionName":"switzerlandwest","isPrimaryRegion":false},{"regionName":"italynorth","isPrimaryRegion":false},{"regionName":"spaincentral","isPrimaryRegion":false},{"regionName":"israelcentral","isPrimaryRegion":false},{"regionName":"taiwannorth","isPrimaryRegion":false},{"regionName":"centraluseuap","isPrimaryRegion":false},{"regionName":"eastus2euap","isPrimaryRegion":false}],"tenantId":"33e01921-4d64-4f8c-a055-5bdaffd5e33d","immutableResourceId":"b98c40bc-36d6-4175-b131-bad4502fe1a3","isPublicResource":true,"isTradeRestrictedResource":false}},"resourcesNotQueriedReasons":{},"numberOfEntityContainersNotQueried":null,"fanoutData":null,"regionalFanoutState":null,"shardErrors":null,"canSupportSkip":false,"facets":null},"regionalErrors":{},"resourceSkipReasons":{},"shardErrors":{},"numberOfResourcesNotIncludedInSearch":0}

filipw avatar Oct 07 '25 15:10 filipw

This is also resolved on Windows Intel

Image Also the Phi-4 models respond well on Intel devices so this indicates a driver issue on ARM Devices as per old issue [#136](https://github.com/microsoft/Foundry-Local/issues/136)

Woah! What did you do to get all the GPU and NPU models for Intel? I have the below and all I see are GPU/CPU. i;m on:

C:\Users\gptestuser>foundry --version
0.7.120+3b92ed4014
CPU Info:
Name: Intel(R) Core(TM) Ultra 7 268V | Cores: 8 | Threads: 8
NPU/Compute Accelerator Info:
Name: Intel(R) AI Boost | Description: Intel(R) AI Boost
GPU Info:
Name: Intel(R) Arc(TM) 140V GPU (16GB) | VRAM: None MB | VideoProcessor: Intel(R) Arc(TM) 140V GPU (16GB) Family | DriverVersion: 32.0.101.8132
Name: Microsoft Remote Display Adapter | VRAM: None MB | VideoProcessor: None | DriverVersion: 10.0.26100.6725
System RAM:
Total RAM: 31.48 GB

DiegoPICT avatar Oct 09 '25 13:10 DiegoPICT

@filipw Checking in to see whether you can list your NPU models with the latest release

natke avatar Nov 17 '25 20:11 natke