optimum icon indicating copy to clipboard operation
optimum copied to clipboard

Exported ONNX model files are much larger than expected, compared to the ones created by an older version

Open whitphx opened this issue 11 months ago • 3 comments

System Info

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.5 LTS
Release:        22.04
Codename:       jammy
$ python -V
Python 3.12.6
$ uv pip freeze
accelerate==1.6.0
certifi==2025.1.31
charset-normalizer==3.4.1
coloredlogs==15.0.1
filelock==3.18.0
flatbuffers==25.2.10
fsspec==2025.3.2
huggingface-hub==0.30.2
humanfriendly==10.0
idna==3.10
jinja2==3.1.6
markupsafe==3.0.2
mpmath==1.3.0
networkx==3.4.2
numpy==2.2.5
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
onnx==1.17.0
onnxruntime==1.20.1
onnxslim==0.1.48
optimum @ git+https://github.com/huggingface/optimum.git@b04feaea78cda58d79b8da67dca3fd0c4ab33435
packaging==25.0
protobuf==6.30.2
psutil==7.0.0
pyyaml==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.3
setuptools==79.0.1
sympy==1.13.3
tokenizers==0.21.1
torch==2.7.0
tqdm==4.67.1
transformers==4.49.0
triton==3.3.0
typing-extensions==4.13.2
urllib3==2.4.0

Who can help?

@michaelbenayoun

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

ONNX model files exported from optimum.exporters.onnx.main_export is larger than the ones created with the older version which @xenova suggested seems to be an issue around the weight deduplication step. Reference: https://huggingface.co/Xenova/nllb-200-distilled-600M/discussions/3

I converted https://huggingface.co/facebook/nllb-200-distilled-600M for example.

  • onnx/decoder_model.onnx that was converted with the older version was 1860454885 bytes (~1.86GB)
  • Newly converted onnx/decoder_model.onnx_data is 2909290496 bytes (~2.91GB) with onnx/decoder_model.onnx in 430168 bytes (430kB).

Expected behavior

The converted model size should be similar.

whitphx avatar Apr 25 '25 09:04 whitphx

@xenova any idea where this si from ?

IlyasMoutawwakil avatar May 07 '25 09:05 IlyasMoutawwakil

I'm not quite sure 👀 possibly a torch/onnx bug, but I haven't done testing to confirm yet.

xenova avatar May 07 '25 10:05 xenova

Hi ! can you try main with --slim option ?

IlyasMoutawwakil avatar May 17 '25 07:05 IlyasMoutawwakil