Foundry-Local Hugging Face Onnx Models

Is it possible to run Onnx models hosted in HF? I tried to keep download model in the model folder, but it is not showing using foundry cache list

May 21 '25 14:05 anktsrkr

@anktsrkr thanks for raising the issue! Yes - it is possible to download models from HuggingFace (HF) and consume them in Foundry Local. The only caveat it that the model should have a genai_config.json file in the folder on HF.

It is likely that when you downloaded from HF that it created symbolic links. To download into your cache directory without symbolic links:

huggingface-cli download REPO --local-dir ~/.foundry/cache/models

Where REPO is the name of the HF repository. If the REPO has many directories then you can download a specific directory using the --include option.

[!NOTE] We're working on having native HF support (pulling ONNX models) in Foundry Local in an upcoming update.

May 21 '25 16:05 samuel100

Hey @samuel100 I tried the approach you described and was trying to run Qwen3-0.6B-ONNX. I renamed generation_config.json to genai_config.json, was able to discover -

However, while I am trying to run the model getting

while i compare with the phi4 which I have download, I certainly can see that has more properties.

Also, when I see genai_config.json I could imaging renaming is not the correct step.

Not sure what is the next step.

Btw! Thanks for the heads up about native HF support!

May 21 '25 18:05 anktsrkr

So, the genai_config.json is different to the generation_config.json. The genai_config.json is used by ONNX runtime and the repo you are using does not have that available.

Probably, the easiest way to create Qwen3-0.6B for Foundry Local is to use Olive. Detailed documentation on how to do this can be found on Compile Hugging Face models to run on Foundry Local.

In step 1 of the documentation it shows how to compile the meta-llama/Llama-3.2-1B-Instruct model. You'd need to update the command to:

olive auto-opt \
    --model_name_or_path Qwen/Qwen3-0.6B \
    --trust_remote_code \
    --output_path models/Qwen3-0.6B \
    --device cpu \
    --provider CPUExecutionProvider \
    --use_model_builder \
    --use_ort_genai \
    --precision int4 \
    --log_level 1

I appreciate having to run this yourself is a pain. Let me follow up with our team on how we can make this easier going forward.

May 26 '25 10:05 samuel100