Foundry-Local icon indicating copy to clipboard operation
Foundry-Local copied to clipboard

Olive Model Conversion

Open Justinius opened this issue 7 months ago • 6 comments

I converted the Llama-3.2 example and it worked - although there should be a way to pull the model into a pre-existing cache instead of having to change the cache directory:

https://github.com/microsoft/Foundry-Local/blob/main/docs/how-to/compile-models-for-foundry-local.md#bash-5

However, if I change the model to something else from huggingface "unsloth/Qwen3-30B-A3B-bnb-4bit" (and go through the hoops of installing bits and bytes, and all that) I get an error:

ValueError: Unable to get dummy inputs for the model. Please provide io_config or install an optimum version that supports the model for export.

I got a similar error when trying to convert nomic-embed-text.

So I don't know if I'm doing something wrong - am I missing a command line argument that is needed in some circumstances but not for Llama 3.2?

Justinius avatar May 19 '25 20:05 Justinius

@Justinius There are a couple of options we are working on to pull in models without changing the cache location:

  1. Run a model outside of the cache e.g. foundry model run --path /model/output
  2. Add a custom model into the model catalog.

For embedding models, I'll post an example in here shortly.

samuel100 avatar May 19 '25 21:05 samuel100

so the other issue will be where I leave the questions about cache.

For this one - some help on embedding and other generation models would be a big help. I just pulled another model from huggingface using the example command line, and I got the same error. "Please provide io_config or install an optimum version that supports the model for export" - I couldn't find anything on the olive pages. So any help on this would be appreciated.

Justinius avatar May 19 '25 22:05 Justinius

So looking more deeply into Olive

https://microsoft.github.io/Olive/how-to/configure-workflows/how-to-configure-model.html

If optimum is installed, Olive will use it to automatically obtain the model’s input/output config and dummy inputs for conversion to ONNX. Else, the model’s io_config must be provided. Refer to [options](https://microsoft.github.io/Olive/reference/options.html#input-model-information) for more details.

So it seems like optimum was pulled into the situation either by the Olive library or Huggingface itself. So that created a requirement for the models input/output config - and it can't find it. So I don't know how it found/created the ones for the Llama-3.2 model conversion. So I still have a way to go to unravel this one.

Justinius avatar May 19 '25 22:05 Justinius

Found this from 2023 : https://opensource.microsoft.com/blog/2023/10/04/accelerating-over-130000-hugging-face-models-with-onnx-runtime/

Implies that not all huggingface is supported - may be lag between models and support.

To that end, microsoft/huggingface or something needs to make it easier to understand which models can leverage Olive/optimum for conversion.

Justinius avatar May 21 '25 04:05 Justinius

ONNX stores the graph representation of the model in an Intermediate Format (IR). To create the IR, Olive needs to pass through some dummy data at graph capture (aka export) stage - this enables the graph capture to find all the different operations that happen in the neutral network. To make this easier, olive tries to use optimum as this has in-built dummy data for a wide variety of architectures:

https://huggingface.co/docs/optimum/exporters/onnx/overview

If you don't use optimum or you have an architecture not recognized by optimum then you'll need to pass in dummy data. We should definitely make that clearer in the docs. Adding the necessary tags.

If you pip install Olive with the auto-opt options, it should have all the libraries (including optimum) to run auto-optimizer:

pip install olive-ai[auto-opt]

Run an optimization:

olive auto-opt \
    --model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
    --trust_remote_code \
    --output_path models/llama \
    --device cpu \
    --provider CPUExecutionProvider \
    --use_ort_genai \
    --precision int4 \
    --log_level 1

samuel100 avatar May 26 '25 10:05 samuel100

@Justinius There are a couple of options we are working on to pull in models without changing the cache location:

  1. Run a model outside of the cache e.g. foundry model run --path /model/output
  2. Add a custom model into the model catalog.

For embedding models, I'll post an example in here shortly.

It will be very nice to have --path option. Currently, it is very challenging to get olive-compiled models running.

jcastillopino avatar Jun 11 '25 19:06 jcastillopino