Registering ONNX models that were downloaded previously
I recently installed the AI Toolkit extension and downloaded an ONNX model, which it stored under the %USERPROFILE%\.aitk\models folder. To preserve disk space in my C: drive, I moved the models folder to a different drive and updated the Playground Agent Model Storage (windowsaistudio.playgroundAgentModelStorage) setting to point to its new location. This seems to work.
I would now like to use other ONNX models that I have previously downloaded from Hugging Face, so I moved these files to the same location used by the AI Tookit while maintaining a similar folder structure (i.e. $publisherName\$modelName\$runtime\$displayName). However, even after restarting VSCode, the models fail to appear under My Models\ONNX nor are they shown as Added in the Model Catalog. Thinking that it might force a model to be recognized, I clicked Add for one of the models that I would like to include but this simply triggered a download.
How can I register these ONNX models so that I don't have to download everything again? I see a my-models.yml file but it does not specify any ONNX models, only the provider. Where is the list of models that appear under My Models stored?
I was mistaken and clicking Add for an ONNX model copied to the models folder does include it in My Models without downloading it again. It seems that I had clicked Phi 3 Mini 128K (CPU - Small, Fast) and the model I already had was Phi 3 Mini 128K (CPU - Small, Fast, Accurate).
I would still like to know where this information is stored and if there's a way to rename a model's display name once it has been registered. For example, I included both llama3.2:latest and llama3.2:1b in My Models and they are both shown as llama3.2.
Also, are these models labeled correctly?
For example, Phi 3 Mini 128K (CPU - Small, Fast) loads cpu-int4-rtn-block-32-onnx, whereas Phi 3 Mini 128K (CPU - Small, Fast, Accurate) loads cpu-int4-rtn-block-32-acc-level-4-onnx. Shouldn't it be the other way round or am I misinterpreting the meaning of accurate in the model's description?
From the Hugging Face model card:
ONNX model for int4 CPU and Mobile: ONNX model for your CPU and Mobile, using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved perf. For mobile devices, we recommend using the model with acc-level-4.
Also, there's only one variant for Phi 4 Mini (CPU - Small, Fast, Accurate) but it's labeled Accurate and loads cpu-int4-rtn-block-32-acc-level-4.
Currently the customized model loading function does not support user assigned folder setting. We have realized this issue and please expect to have this function supported in April's release of AI Toolkit.
Naming of model issue also would be fixed in April's release.
Naming of model issue also would be fixed in April's release.
I had two separate naming problems:
-
In My Models, different variants of the same model are displayed with identical text (e.g. Ollama's
llama3.2:latestandllama3.2:1bboth appear asllama3.2). Here, I would add that in addition to fixing the default naming, it would be nice to allow users to change it. -
In the Model Catalog, ONNX models described as (Small, Fast, Accurate) are less accurate than the corresponding model labeled as (Small, Fast). I would also add here that it seems strange to describe both models as Fast when one is faster than the other or to label them as Small when they are approximately the same size and there are no larger variants.
Just to make sure, are you referring to both naming problems?
Naming of model issue also would be fixed in April's release.
I had two separate naming problems:
- In My Models, different variants of the same model are displayed with identical text (e.g. Ollama's
llama3.2:latestandllama3.2:1bboth appear asllama3.2). Here, I would add that in addition to fixing the default naming, it would be nice to allow users to change it.- In the Model Catalog, ONNX models described as (Small, Fast, Accurate) are less accurate than the corresponding model labeled as (Small, Fast). I would also add here that it seems strange to describe both models as Fast when one is faster than the other or to label them as Small when they are approximately the same size and there are no larger variants.
Just to make sure, are you referring to both naming problems?
We are aware of the Ollama issue and will fix it.
Ollama display name now contains quantization
Ollama display name now contains quantization
Hi @a1exwang.
Is this already published? I upgraded to the latest pre-release version (0.11.2025040806) but I still have the same problem. Just to be sure, I restarted vscode, deleted and then re-added the models but there was no change.
On the other hand, when you right-click each model in My Models and select Copy Model Name, it is aware of which one is which.
@thatChang, could you please provide an update for this?
Customized model loading has already supported in April. Naming issues of ONNX models are also covered in April's release. Could you provide more information about Ollama models? @a1exwang
@timenick Is this fix included in the published version? I checked again and still see the same problem using v0.16.0.
To be sure, I deleted all the Ollama models from My Models and added them again and, for example, llama3.2:1b and llama3.2:latest are both displayed as llama3.2.
@timenick Is this fix included in the published version? I checked again and still see the same problem using v0.16.0.
To be sure, I deleted all the Ollama models from My Models and added them again and, for example,
llama3.2:1bandllama3.2:latestare both displayed asllama3.2.
@a1exwang Could you please check the Ollama display name issue? Thanks
Again. Not really fixed for me but I give up. Evidently you are looking at a different problem.
(Version 0.19.2025081105)