Foundry-Local icon indicating copy to clipboard operation
Foundry-Local copied to clipboard

Hugging Face Onnx Models

Open anktsrkr opened this issue 7 months ago • 3 comments

Is it possible to run Onnx models hosted in HF? I tried to keep download model in the model folder, but it is not showing using foundry cache list

anktsrkr avatar May 21 '25 14:05 anktsrkr

@anktsrkr thanks for raising the issue! Yes - it is possible to download models from HuggingFace (HF) and consume them in Foundry Local. The only caveat it that the model should have a genai_config.json file in the folder on HF.

It is likely that when you downloaded from HF that it created symbolic links. To download into your cache directory without symbolic links:

huggingface-cli download REPO --local-dir ~/.foundry/cache/models

Where REPO is the name of the HF repository. If the REPO has many directories then you can download a specific directory using the --include option.

[!NOTE] We're working on having native HF support (pulling ONNX models) in Foundry Local in an upcoming update.

samuel100 avatar May 21 '25 16:05 samuel100

Hey @samuel100 I tried the approach you described and was trying to run Qwen3-0.6B-ONNX. I renamed generation_config.json to genai_config.json, was able to discover -

Image

However, while I am trying to run the model getting

Image

while i compare with the phi4 which I have download, I certainly can see that has more properties.

Also, when I see genai_config.json I could imaging renaming is not the correct step.

Not sure what is the next step.

Btw! Thanks for the heads up about native HF support!

anktsrkr avatar May 21 '25 18:05 anktsrkr

So, the genai_config.json is different to the generation_config.json. The genai_config.json is used by ONNX runtime and the repo you are using does not have that available.

Probably, the easiest way to create Qwen3-0.6B for Foundry Local is to use Olive. Detailed documentation on how to do this can be found on Compile Hugging Face models to run on Foundry Local.

In step 1 of the documentation it shows how to compile the meta-llama/Llama-3.2-1B-Instruct model. You'd need to update the command to:

olive auto-opt \
    --model_name_or_path Qwen/Qwen3-0.6B \
    --trust_remote_code \
    --output_path models/Qwen3-0.6B \
    --device cpu \
    --provider CPUExecutionProvider \
    --use_model_builder \
    --use_ort_genai \
    --precision int4 \
    --log_level 1

I appreciate having to run this yourself is a pain. Let me follow up with our team on how we can make this easier going forward.

samuel100 avatar May 26 '25 10:05 samuel100