optimum
optimum copied to clipboard
Community contribution - `optimum.exporters.onnx` support for new models!
Following what was done by @ChainYo in Transformers, in the ONNXConfig: Add a configuration for all available models issue, the idea is to add support for exporting new models in optimum.exporters.onnx
.
This issue is about the working group specially created for this task. If you are interested in helping out, reply here, take a look at this organization, or add ChainYo#3610
on discord.
We want to contribute to Hugging Face's ONNX export implementation for all available models on Hugging Face Hub. There are already a lot of architectures implemented for converting PyTorch models to ONNX, but we need more! We need them all!
Feel free to join us in this adventure! Join the org by clicking here
Here is a non-exhaustive list of models that all models available:
- [x] Albert
- [x] BART
- [x] BeiT
- [x] BERT
- [ ] BigBird (Critical issue: https://github.com/huggingface/optimum/issues/754#issuecomment-1429830467)
- [ ] BigBirdPegasus (Critical issue: https://github.com/huggingface/optimum/issues/754#issuecomment-1429830467)
- [x] Blenderbot
- [x] BlenderbotSmall
- [ ] BLIP-2
- [x] BLOOM
- [x] CamemBERT
- [ ] CANINE
- [x] CLIP
- [x] CodeGen
- [x] ConvNext
- [x] ConvBert
- [ ] CTRL
- [x] CvT
- [x] Data2VecText
- [x] Data2VecVision
- [x] Deberta
- [x] DebertaV2
- [x] DeiT
- [ ] DecisionTransformer
- [x] DETR
- [x] Distilbert
- [ ] DPR
- [x] DPT
- [x] ELECTRA
- [ ] FNet
- [ ] FSMT
- [x] Flaubert
- [ ] FLAVA
- [ ] Funnel Transformer
- [x] GLPN
- [x] GPT2
- [x] GPTJ
- [x] GPT-Neo
- [x] GPT-NeoX
- [x] Hubert
- [x] I-Bert
- [x] ImageGPT ๐ ๏ธ @adit299
- [ ] LED
- [x] LayoutLM
- [ ] LayoutLMv2 (but ๐ ๏ธ in Transformers)
- [x] LayoutLMv3
- [ ] LayoutXLM
- [ ] LED
- [x] LeViT
- [ ] ๐ ๏ธ Longformer (Critical issue: https://github.com/huggingface/optimum/issues/776#issuecomment-1429680121)
- [x] LongT5
- [ ] Luke (but ๐ ๏ธ in Transformers)
- [ ] Lxmert
- [x] M2M100
- [ ] MaskFormer
- [x] mBart
- [ ] MCTCT
- [x] MPNet
- [x] MT5
- [x] MarianMT
- [ ] MegatronBert
- [x] MobileBert
- [x] MobileViT
- [ ] Nystrรถmformer
- [x] OpenAIGPT-2
- [x] OPT ((but ๐ ๏ธ in Transformers)
- [x] OWLViT
- [x] Pix2Struct
- [x] PLBart
- [x] Pegasus
- [x] Perceiver
- [x] PoolFormer
- [ ] ProphetNet
- [ ] QDQBERT
- [ ] RAG
- [ ] REALM
- [ ] Reformer (but ๐ ๏ธ in Transformers)
- [x] RemBert
- [x] ResNet
- [x] RegNet ๐ ๏ธ @asrimanth
- [ ] RetriBert
- [x] RoFormer
- [x] RoBERTa
- [x] SEW
- [x] SEW-D
- [x] SegFormer
- [x] Speech2Text
- [ ] Speech2Text2
- [x] Splinter
- [x] SqueezeBERT
- [x] Swin Transformer
- [x] T5
- [ ] TAPAS ๐ ๏ธ @someshfengde
- [ ] TAPEX
- [ ] Transformer XL
- [x] TrOCR
- [ ] UniSpeech
- [ ] UniSpeech-SAT
- [ ] VAN
- [x] ViT
- [ ] Vilt
- [ ] VisualBERT
- [x] Wav2Vec2
- [x] WavLM
- [x] Whisper
- [ ] XGLM
- [x] XLM
- [ ] XLMProphetNet
- [x] XLM-RoBERTa
- [x] XLM-RoBERTa-XL
- [ ] XLNet (but ๐ ๏ธ in Transformers)
- [x] YOLOS
- [ ] Yoso
๐ ๏ธ next to a model suggests that the PR is in progress. If there is nothing next to a model, it means that ONNX does not yet support the model, and thus we need to add support for it.
If you need help implementing an unsupported model, here is a guide from HuggingFace Optimum documentation.
Hi! I'm trying to add support for VisualBERT, which works for VQA, VCR, NLVR and RPG. Since the guide says that "When inheriting from a middle-end class, look for the one handling the same modality / category of models as the one you are trying to support.", I'm using TextAndVisionOnnxConfig because this is a multimodal model. Then initialized NORMALIZED_CONFIG_CLASS = NormalizedTextAndVisionConfig I this OK so far?
The problem comes when implementing the inputs property... What is it that this property specifies? In the guide, I see that this inputs are exactly BERT's tokenizer's output keys, and values are the tensor dimensions for each key of the tokenizer's output. This will vary task-wise so I'd have to make a different axis for each task. Is this ok?
Thanks for the help!
EDIT: I see VisualBERT is implemented separately by task, but VisualBertForPreTraining is also provided for customized down-stream tasks. Should I implement a diferent configuration for each task?
EDIT II: I see this issue was previously in the transformers repo, it seems like the docs on how to add the ONNX configuration are written in a way that ignores the current optimum implementation, I have sorted some of the difficulties that arise from this assuming one ONNX config for the whole model. Can I help with an update for this guide?
Hi @mszsorondo , indeed the page https://huggingface.co/docs/transformers/serialization#export-to-onnx is a bit outdated. I'll do a PR to fix it. In your EDIT II, were you referring to this page?
I'd recommend to refer to: https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/contribute . If you see any issue / unclear steps in the guide, don't hesitate to open a PR!
As for VisualBERT, I guess you haven't picked the easiest one :) I think you can leave VisualBertForPreTraining
aside, it's probably better to support the rest for inference.
Indeed NORMALIZED_CONFIG_CLASS = NormalizedTextAndVisionConfig
seems good.
The problem comes when implementing the inputs property... What is it that this property specifies? In the guide, I see that this inputs are exactly BERT's tokenizer's output keys, and values are the tensor dimensions for each key of the tokenizer's output. This will vary task-wise so I'd have to make a different axis for each task. Is this ok?
EDIT: I see VisualBERT is implemented separately by task, but VisualBertForPreTraining is also provided for customized down-stream tasks. Should I implement a diferent configuration for each task?
I don't think you need to implement configs for each tasks. Apparently all tasks take as inputs input_ids
, token_type_ids
, attention_mask
, visual_embeds
, visual_token_type_ids
, visual_attention_mask
. The VisualBertForRegionToPhraseAlignment
seem to have an additional region_to_phrase_position
input.
To implement the input
method, you need to specify which inputs / outputs the model takes, and what are the dynamic axis: for example, for CLIP, that is https://github.com/huggingface/optimum/blob/9ac17034b6cb27da23499393598086f0b3b9223d/optimum/exporters/onnx/model_configs.py#L523-L528
You can very well do an if/else in the input/output keys (or axis) depending on the task, for example BART: https://github.com/huggingface/optimum/blob/9ac17034b6cb27da23499393598086f0b3b9223d/optimum/exporters/onnx/model_configs.py#L382-L389
I think the piece where you will have the most work to do is to extend the dummy inputs generators. They are meant to generate inputs for the model, without using a preprocessor, and help to flexibly generate inputs of various shapes for example (for export validation). You would need to extend an existing one, or create a new input generator to support the visual_embeds
, visual_token_type_ids
, visual_attention_mask
, region_to_phrase_position
inputs. Unless you see an existing input generator in here you could reuse the logic of, my guess is that you can create a VisualBertDummyInputGenerator
for those four inputs.
Thanks for your help @fxmarty
In your EDIT II, were you referring to this page?
I was actually referring to the second guide (https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/contribute), there are some minor issues with two function calls at the export step + one lacking import. Submitted PR #662
I advanced with the inputs function and did the export step, and indeed got an error regarding visual_embeds
(surely this is also a problem for visual_token_type_ids
, visual_attention_mask
and region_to_phrase_position
as you suggest), so I'll go for the new input generator.
Hi @michaelbenayoun!
Is someone working on adding the Pegasus ONNX config?
If not, I would like to look into it ๐(under your guidance, since I haven't done written a ONNXConfig yet)
Hi @bhavnicksm , @mht-sharma just merged the Pegasus ONNX config yesterday! https://github.com/huggingface/optimum/pull/620
@fxmarty Still facing an issue
Hi @bhavnicksm , @mht-sharma just merged the Pegasus ONNX config yesterday! https://github.com/huggingface/optimum/pull/620
I installed optimum directly from source here using
!pip install --quiet git+https://github.com/huggingface/optimum.git
I tried to use Pegasus with an inference right now using ORTModelforSeq2SeqLM, using the following code:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from optimum.onnxruntime import ORTModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("tuner007/pegasus_paraphrase")
model = AutoModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase")
ort_model = ORTModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase", from_transformers=True)
and it gives me the following error:
/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:234: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:241: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:273: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-7-2e0907dfd025>](https://localhost:8080/#) in <module>
----> 1 ort_model = ORTModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase", from_transformers=True)
9 frames
[/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_ort.py](https://localhost:8080/#) in from_pretrained(cls, model_id, from_transformers, force_download, use_auth_token, cache_dir, subfolder, config, local_files_only, provider, session_options, provider_options, **kwargs)
555 `ORTModel`: The loaded ORTModel model.
556 """
--> 557 return super().from_pretrained(
558 model_id,
559 from_transformers=from_transformers,
[/usr/local/lib/python3.8/dist-packages/optimum/modeling_base.py](https://localhost:8080/#) in from_pretrained(cls, model_id, from_transformers, force_download, use_auth_token, cache_dir, subfolder, config, local_files_only, **kwargs)
323
324 from_pretrained_method = cls._from_transformers if from_transformers else cls._from_pretrained
--> 325 return from_pretrained_method(
326 model_id=model_id,
327 config=config,
[/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_seq2seq.py](https://localhost:8080/#) in _from_transformers(cls, model_id, config, use_auth_token, revision, force_download, cache_dir, subfolder, local_files_only, use_cache, provider, session_options, provider_options, use_io_binding, task)
1144 output_names.append(ONNX_DECODER_WITH_PAST_NAME)
1145 models_and_onnx_configs = get_encoder_decoder_models_for_export(model, onnx_config)
-> 1146 export_models(
1147 models_and_onnx_configs=models_and_onnx_configs,
1148 opset=onnx_config.DEFAULT_ONNX_OPSET,
[/usr/local/lib/python3.8/dist-packages/optimum/exporters/onnx/convert.py](https://localhost:8080/#) in export_models(models_and_onnx_configs, output_dir, opset, output_names, device, input_shapes)
534
535 outputs.append(
--> 536 export(
537 model=submodel,
538 config=sub_onnx_config,
[/usr/local/lib/python3.8/dist-packages/optimum/exporters/onnx/convert.py](https://localhost:8080/#) in export(model, config, output, opset, device, input_shapes)
605 f" got: {torch.__version__}"
606 )
--> 607 return export_pytorch(model, config, opset, output, device=device, input_shapes=input_shapes)
608
609 elif is_tf_available() and issubclass(type(model), TFPreTrainedModel):
[/usr/local/lib/python3.8/dist-packages/optimum/exporters/onnx/convert.py](https://localhost:8080/#) in export_pytorch(model, config, opset, output, device, input_shapes)
368 # Export can work with named args but the dict containing named args has to be the last element of the args
369 # tuple.
--> 370 onnx_export(
371 model,
372 (dummy_inputs,),
[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions)
502 """
503
--> 504 _export(
505 model,
506 args,
[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, onnx_shape_inference, export_modules_as_functions)
1527 _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
1528
-> 1529 graph, params_dict, torch_out = _model_to_graph(
1530 model,
1531 args,
[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in _model_to_graph(model, args, verbose, input_names, output_names, operator_export_type, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size, training, dynamic_axes)
1113
1114 try:
-> 1115 graph = _optimize_graph(
1116 graph,
1117 operator_export_type,
[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module)
662
663 graph = _C._jit_pass_onnx(graph, operator_export_type)
--> 664 _C._jit_pass_onnx_lint(graph)
665 _C._jit_pass_lint(graph)
666
RuntimeError: Unable to cast from non-held to held instance (T& to Holder<T>) (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for type information)
@bhavnicksm Can you open an issue in Optimum with your environment details? We can track it there!
@fxmarty Please re-open this. ๐ค
Thanks!
I can look into ImageGPT, if it has not yet been claimed.
Feel free! Don't hesitate to ask any question if needed.
Can I take TAPAS if it's not yet been claimed?
Hello, Can I work on RegNet?
Yes to both, feel free! I updated the list saying that you are working on it.
Hi @michaelbenayoun, I went into the codebase recently and I think the list above may not be the latest update. I found that a few models such as
- PoolFormer
- Hubert
- MPnet
- wav2vec
already have their own configurations in this file.
thank you @hazrulakmal , I updated the list!
@fxmarty Re-open this, please ๐ค
@fxmarty working on FLAVA
@rcshubhadeep I moved your issue to https://github.com/huggingface/optimum/issues/968
hi , is optimum supports converting Llama (alpaca-lora) to onnx ? It would be great if i get some insights in this
hi , is optimum supports converting Llama (alpaca-lora) to onnx ? It would be great if i get some insights in this
Yes, this is supported and was introduced in https://github.com/huggingface/optimum/pull/975. You'll need to have Optimum v1.8 to do it.
The TasksManager
allows to map model classes to export configuratons, here ONNX ones.
Registering your ONNX config will make it possible for you to use it with the CLI and everything else.
Are you doing a PR that will be merged on optimum
?
If so, go to the optimum/exporters/tasks.py
file and add an entry in the _SUPPORTED_MODEL_TYPE
class attribute:
_SUPPORTED_MODEL_TYPE = {
....,
"custom": supported_task_mapping("text-classification", ...., onnx="CustomOnnxConfig")
}
But if you are not doing a PR that will be merged in optimum
, and want to dynamically register your class in your own library you can create a registering method:
register_for_onnx = TasksManager.create_register("onnx")
@register_for_onnx("model_type_here", "text-classification", ...)
class CustomOnnxConfig(TextEncoderOnnxConfig):
...
If you do it programatically I do not think you need to register anything.
What's your model? You put bert
here, but bert
is already registered for ONNX so nothing happens.
Alright, could you open a PR for your issue please? We will try to help you there.
Thank you for spending time on me! I think PR will be a difficult thing to do, since I am not that proficient and do not think many people will want to use my architecture anyway.
Maybe you can advice how to do it code just for my library?
base_model = CustomBertForTokenClassification.from_pretrained("my-checkpoint")
base_model.config
returns BertConfig, which I think I need to overwrite with the custom config I created in the previous step...
Sorry I meant a separete issue...
Thank you a lot, I'll delete my comments here since they are unrelated to the discussion. I asked on discussion forum
I can work on CvT, if its open
Hi @rishabbala , sounds good, let us know if you encounter any help! A good reference is https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/contribute
According to the above list, export of BLOOM models to ONNX is already supported, right?
Is export to ONNX already supposed to work for base models that have been finetuned with PEFT / LoRA?
Using the bigscience/bloom-560m
base model and finetuning with PEFT / LoRA, I was able to perform inference after exporting to ONNX, but the model predictions are degraded ๐ค Details: https://github.com/huggingface/peft/issues/670