transformers
transformers copied to clipboard
TF to ONNX export fails with large models
System Info
transformersversion: 4.21.1- Platform: Linux-4.15.0-187-generic-x86_64-with-debian-buster-sid
- Python version: 3.7.5
- Huggingface_hub version: 0.8.1
- PyTorch version (GPU?): not installed (NA)
- Tensorflow version (GPU?): 2.7.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help?
No response
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Run python -m transformers.onnx --model=gpt2-large --framework=tf onnx/
See error like below:
Traceback (most recent call last):
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tf2onnx/tf_loader.py", line 221, in from_trackable
frozen_graph = from_function(concrete_func, inputs, outputs, large_model)
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tf2onnx/tf_loader.py", line 280, in from_function
raise e
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tf2onnx/tf_loader.py", line 273, in from_function
frozen_func = convert_variables_to_constants_v2(func, lower_control_flow=False, aggressive_inlining=True)
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tensorflow/python/framework/convert_to_constants.py", line 1156, in convert_variables_to_constants_v2
converted_input_indices)
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tensorflow/python/framework/convert_to_constants.py", line 1082, in _construct_concrete_function
new_output_names)
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tensorflow/python/eager/wrap_function.py", line 660, in function_from_graph_def
wrapped_import = wrap_function(_imports_graph_def, [])
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tensorflow/python/eager/wrap_function.py", line 631, in wrap_function
collections={}),
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1143, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tensorflow/python/eager/wrap_function.py", line 87, in __call__
return self.call_with_variable_creator_scope(self._fn)(*args, **kwargs)
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tensorflow/python/eager/wrap_function.py", line 93, in wrapped
return fn(*args, **kwargs)
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tensorflow/python/eager/wrap_function.py", line 654, in _imports_graph_def
importer.import_graph_def(graph_def, name="")
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 552, in new_func
return func(*args, **kwargs)
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tensorflow/python/framework/importer.py", line 412, in import_graph_def
producer_op_list=producer_op_list)
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tensorflow/python/framework/importer.py", line 501, in _import_graph_def_internal
with c_api_util.tf_buffer(graph_def.SerializeToString()) as serialized:
ValueError: Message tensorflow.GraphDef exceeds maximum protobuf size of 2GB: 3096993336
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/craig/.pyenv/versions/3.7.5/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/craig/.pyenv/versions/3.7.5/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/transformers/onnx/__main__.py", line 107, in <module>
main()
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/transformers/onnx/__main__.py", line 94, in main
args.output,
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/transformers/onnx/convert.py", line 338, in export
return export_tensorflow(preprocessor, model, config, opset, output, tokenizer=tokenizer)
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/transformers/onnx/convert.py", line 265, in export_tensorflow
onnx_model, _ = tf2onnx.convert.from_keras(model, input_signature, opset=opset)
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tf2onnx/convert.py", line 493, in from_keras
tf_loader.from_trackable(model, concrete_func, input_names, output_names, large_model)
File "/home/craig/.pyenv/versions/tf-hf-test/lib/python3.7/site-packages/tf2onnx/tf_loader.py", line 224, in from_trackable
raise ValueError(err_large_model)
ValueError: model exceeds maximum protobuf size of 2GB. Try setting large_model.
Expected behavior
Export should still be successful for large TF models. tf2onnx expects large_model to be passed in should the protobuf exceed 2 GB. Not sure if tf2onnx behavior will be changed, but maybe transformers can account for this before using tf2onnx?
cc @JingyaHuang @michaelbenayoun
If there are no onnx-level solutions, it may be due to TF1 code (embeddings) in our models -- see https://github.com/tensorflow/tensorflow/issues/45041
Rewriting embeddings into TF2 code is in our to do list, which may fix this issue.
TF2ONNX offers the support for exporting large ONNX tensors with external files, however by adding the flag to the ONNX exporter of transformers, it doesn't work correctly for the moment:
File "/home/ubuntu/anaconda3/envs/venv_onnx_large/lib/python3.9/site-packages/transformers/onnx/convert.py", line 338, in export
return export_tensorflow(preprocessor, model, config, opset, output, tokenizer=tokenizer)
File "/home/ubuntu/anaconda3/envs/venv_onnx_large/lib/python3.9/site-packages/transformers/onnx/convert.py", line 265, in export_tensorflow
onnx_model, _ = tf2onnx.convert.from_keras(model, input_signature, opset=opset, large_model=True)
File "/home/ubuntu/anaconda3/envs/venv_onnx_large/lib/python3.9/site-packages/tf2onnx/convert.py", line 495, in from_keras
model_proto, external_tensor_storage = _convert_common(
File "/home/ubuntu/anaconda3/envs/venv_onnx_large/lib/python3.9/site-packages/tf2onnx/convert.py", line 165, in _convert_common
g = process_tf_graph(tf_graph, const_node_values=const_node_values,
File "/home/ubuntu/anaconda3/envs/venv_onnx_large/lib/python3.9/site-packages/tf2onnx/tfonnx.py", line 459, in process_tf_graph
main_g, subgraphs = graphs_from_tf(tf_graph, input_names, output_names, shape_override, const_node_values,
File "/home/ubuntu/anaconda3/envs/venv_onnx_large/lib/python3.9/site-packages/tf2onnx/tfonnx.py", line 499, in graphs_from_tf
utils.check_io(input_names, output_names, output_shapes.keys())
File "/home/ubuntu/anaconda3/envs/venv_onnx_large/lib/python3.9/site-packages/tf2onnx/utils.py", line 316, in check_io
raise ValueError("Inputs/Outputs Not Found")
ValueError: Inputs/Outputs Not Found
Further investigation needs to be done from the TensorFlow side. And I will be happy to help with a PR to enable this in transformers' onnx tf exporter once we are sure that the large proto export features work correctly.
If there are no onnx-level solutions, it may be due to TF1 code (embeddings) in our models -- see tensorflow/tensorflow#45041
Rewriting embeddings into TF2 code is in our to do list, which may fix this issue.
Didn't know that, ok, it seems that it is not just a problem from the limit of protobuf size then.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.