optimum Exported ONNX model files are much larger than expected, compared to the ones created by an older version

System Info

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.5 LTS
Release:        22.04
Codename:       jammy
$ python -V
Python 3.12.6
$ uv pip freeze
accelerate==1.6.0
certifi==2025.1.31
charset-normalizer==3.4.1
coloredlogs==15.0.1
filelock==3.18.0
flatbuffers==25.2.10
fsspec==2025.3.2
huggingface-hub==0.30.2
humanfriendly==10.0
idna==3.10
jinja2==3.1.6
markupsafe==3.0.2
mpmath==1.3.0
networkx==3.4.2
numpy==2.2.5
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
onnx==1.17.0
onnxruntime==1.20.1
onnxslim==0.1.48
optimum @ git+https://github.com/huggingface/optimum.git@b04feaea78cda58d79b8da67dca3fd0c4ab33435
packaging==25.0
protobuf==6.30.2
psutil==7.0.0
pyyaml==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.3
setuptools==79.0.1
sympy==1.13.3
tokenizers==0.21.1
torch==2.7.0
tqdm==4.67.1
transformers==4.49.0
triton==3.3.0
typing-extensions==4.13.2
urllib3==2.4.0

Who can help?

@michaelbenayoun

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

ONNX model files exported from optimum.exporters.onnx.main_export is larger than the ones created with the older version which @xenova suggested seems to be an issue around the weight deduplication step. Reference: https://huggingface.co/Xenova/nllb-200-distilled-600M/discussions/3

I converted https://huggingface.co/facebook/nllb-200-distilled-600M for example.

onnx/decoder_model.onnx that was converted with the older version was 1860454885 bytes (~1.86GB)
Newly converted onnx/decoder_model.onnx_data is 2909290496 bytes (~2.91GB) with onnx/decoder_model.onnx in 430168 bytes (430kB).

Expected behavior

The converted model size should be similar.

Apr 25 '25 09:04 whitphx

@xenova any idea where this si from ?

May 07 '25 09:05 IlyasMoutawwakil

I'm not quite sure 👀 possibly a torch/onnx bug, but I haven't done testing to confirm yet.

May 07 '25 10:05 xenova

Hi ! can you try main with --slim option ?

May 17 '25 07:05 IlyasMoutawwakil