optimum [BUG] ONNX optimization fails when optimizing AlbertXXL despite the weights being under 2GB

System Info

Optimum 1.2.3[onnxruntime-gpu], PyTorch 1.12.0a0+bd13bc6, CUDA 11.6, Ubuntu 18.04, Transformers 4.19.0, Onnxruntime nightly build (ort-nightly-gpu 1.12.0.dev20220616003) because otherwise there's an error: Traceback (most recent call last): File "run_qa.py", line 524, in <module> main() File "run_qa.py", line 311, in main optimizer.export( File "/opt/conda/lib/python3.8/site-packages/optimum/onnxruntime/optimization.py", line 131, in export optimizer = optimize_model( File "/opt/conda/lib/python3.8/site-packages/onnxruntime/transformers/optimizer.py", line 215, in optimize_model temp_model_path = optimize_by_onnxruntime(input, File "/opt/conda/lib/python3.8/site-packages/onnxruntime/transformers/optimizer.py", line 96, in optimize_by_onnxruntime session = onnxruntime.InferenceSession(onnx_model_path, sess_options, **kwargs) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 335, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 363, in _create_inference_session raise ValueError("This ORT build has {} enabled. ".format(available_providers) + ValueError: This ORT build has ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'], ...)

The Onnxruntime needs a newer version than the most recently released tag because even the newest release (1.11.1) doesn't yet specify an explicit execution provider, hence we use the nightly build instead.

Who can help?

@JingyaHuang @lewtun @mich

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

The provided run_qa.py script in the examples/onnxruntime/optimization/question-answering doesn't work as expected with ahotrod/albert_xxlargev1_squad2_512.

To reproduce: python run_qa.py \ --model_name_or_path ahotrod/albert_xxlargev1_squad2_512 \ --dataset_name squad_v2 \ --optimization_level 99 \ --do_eval \ --output_dir /home/ubuntu/albert_xxlargev1_squad2_512_onnx_optimized \ --execution_provider CUDAExecutionProvider \ --optimize_for_gpu

The resulting error: Traceback (most recent call last): File "run_qa.py", line 524, in <module> main() File "run_qa.py", line 311, in main optimizer.export( File "/opt/conda/lib/python3.8/site-packages/optimum/onnxruntime/optimization.py", line 142, in export optimizer.save_model_to_file(onnx_optimized_model_output_path, use_external_data_format) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/transformers/models/gpt2/../../onnx_model.py", line 934, in save_model_to_file save_model(self.model, output_path) File "/opt/conda/lib/python3.8/site-packages/onnx/__init__.py", line 202, in save_model s = _serialize(proto) File "/opt/conda/lib/python3.8/site-packages/onnx/__init__.py", line 71, in _serialize result = proto.SerializeToString() ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 3038449703

This error triggers the ModelProto > 2gb error despite the model weights being less than 2gb (~870MB, rather).

Expected behavior

The optimized ONNX model is successfully saved.

Jun 18 '22 00:06 michaelroyzen

Hi @michaelroyzen,

Although the size of PyTorch model is smaller than 2GB, it is still possible that the exported ONNX model exceeds 2GB. (Link to a related issue).

Ideally, by setting the argument use_external_data_format=True in optimizer shall allow you to export onnx model larger than 2GB with external files. So for your case, inserting a line use_external_data_format=True in the run_qa example.

However, unfortunately, the write_external_data_tensors in onnx library doesn't seem to work well for the moment (link to a related issue), and may be we can move the discussion to ONNX Runtime side.

Jun 28 '22 15:06 JingyaHuang

Yes, adding use_external_data_format=True to run_qa didn't work unfortunately @JingyaHuang -- it seems the issue is with ONNX. This is what I tried in run_qa.py: optimizer.export( onnx_model_path=model_path, onnx_optimized_model_output_path=optimized_model_path, optimization_config=optimization_config, **use_external_data_format=True** )

Jun 28 '22 17:06 michaelroyzen

@JingyaHuang @michaelroyzen I think this is what we need. https://github.com/huggingface/optimum/pull/302

Jul 17 '22 18:07 sam-h-bean

@JingyaHuang it seems to me the issue is caused by an incorrect import:

/opt/conda/lib/python3.8/site-packages/onnxruntime/transformers/models/gpt2/../../onnx_model.py

when

from onnxruntime.transformers.optimizer import optimize_model

is called, when in fact the file should be this.

Jul 28 '22 06:07 michaelroyzen

Hi @michaelroyzen, have you solved the issue of saving the onnx model?

Aug 02 '22 10:08 JingyaHuang

BTW, if you are not using a vision model, setting optimization_level=2 is generally good enough.

Aug 02 '22 17:08 JingyaHuang

To follow up on the issue, here is the thread in onnx where we continued the discussion.

It turns out that with the hard constraint of protobuf size limit(2GB), ONNX offers some options which export large tensors to external files. Users can tune the parameters to find the best fit.

However, for several extremely large models(the case of AlbertXXL), the structural proto could still exceed 2GB after exporting all tensors to external files. In this case, a workaround would be to either load the fp16 model weights(if the model was also trained with mixed precision)

>>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained('albert-xxlarge-v1', torch_dtype=torch.float16)

or use ORTQuantizer to proceed with the quantization.

A PR is in progress to improve the compatibility of ORTOptimizer and ORTQuantizer in cases of large ONNX proto.

Aug 11 '22 08:08 JingyaHuang

HI,did you resolved the bug? I still cant manage to convert a Albert xxl-model.

May 22 '23 18:05 shaked571

@shaked571 Could you share the command/script you used to export the model please?

May 22 '23 20:05 regisss

optimum optimum copied to clipboard

[BUG] ONNX optimization fails when optimizing AlbertXXL despite the weights being under 2GB

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

optimum
optimum copied to clipboard