optimum icon indicating copy to clipboard operation
optimum copied to clipboard

Community contribution - `optimum.exporters.onnx` support for new models!

Open michaelbenayoun opened this issue 2 years ago โ€ข 41 comments

Following what was done by @ChainYo in Transformers, in the ONNXConfig: Add a configuration for all available models issue, the idea is to add support for exporting new models in optimum.exporters.onnx.

This issue is about the working group specially created for this task. If you are interested in helping out, reply here, take a look at this organization, or add ChainYo#3610 on discord.

We want to contribute to Hugging Face's ONNX export implementation for all available models on Hugging Face Hub. There are already a lot of architectures implemented for converting PyTorch models to ONNX, but we need more! We need them all!

Feel free to join us in this adventure! Join the org by clicking here

Here is a non-exhaustive list of models that all models available:

  • [x] Albert
  • [x] BART
  • [x] BeiT
  • [x] BERT
  • [ ] BigBird (Critical issue: https://github.com/huggingface/optimum/issues/754#issuecomment-1429830467)
  • [ ] BigBirdPegasus (Critical issue: https://github.com/huggingface/optimum/issues/754#issuecomment-1429830467)
  • [x] Blenderbot
  • [x] BlenderbotSmall
  • [ ] BLIP-2
  • [x] BLOOM
  • [x] CamemBERT
  • [ ] CANINE
  • [x] CLIP
  • [x] CodeGen
  • [x] ConvNext
  • [x] ConvBert
  • [ ] CTRL
  • [x] CvT
  • [x] Data2VecText
  • [x] Data2VecVision
  • [x] Deberta
  • [x] DebertaV2
  • [x] DeiT
  • [ ] DecisionTransformer
  • [x] DETR
  • [x] Distilbert
  • [ ] DPR
  • [x] DPT
  • [x] ELECTRA
  • [ ] FNet
  • [ ] FSMT
  • [x] Flaubert
  • [ ] FLAVA
  • [ ] Funnel Transformer
  • [x] GLPN
  • [x] GPT2
  • [x] GPTJ
  • [x] GPT-Neo
  • [x] GPT-NeoX
  • [x] Hubert
  • [x] I-Bert
  • [x] ImageGPT ๐Ÿ› ๏ธ @adit299
  • [ ] LED
  • [x] LayoutLM
  • [ ] LayoutLMv2 (but ๐Ÿ› ๏ธ in Transformers)
  • [x] LayoutLMv3
  • [ ] LayoutXLM
  • [ ] LED
  • [x] LeViT
  • [ ] ๐Ÿ› ๏ธ Longformer (Critical issue: https://github.com/huggingface/optimum/issues/776#issuecomment-1429680121)
  • [x] LongT5
  • [ ] Luke (but ๐Ÿ› ๏ธ in Transformers)
  • [ ] Lxmert
  • [x] M2M100
  • [ ] MaskFormer
  • [x] mBart
  • [ ] MCTCT
  • [x] MPNet
  • [x] MT5
  • [x] MarianMT
  • [ ] MegatronBert
  • [x] MobileBert
  • [x] MobileViT
  • [ ] Nystrรถmformer
  • [x] OpenAIGPT-2
  • [x] OPT ((but ๐Ÿ› ๏ธ in Transformers)
  • [x] OWLViT
  • [x] Pix2Struct
  • [x] PLBart
  • [x] Pegasus
  • [x] Perceiver
  • [x] PoolFormer
  • [ ] ProphetNet
  • [ ] QDQBERT
  • [ ] RAG
  • [ ] REALM
  • [ ] Reformer (but ๐Ÿ› ๏ธ in Transformers)
  • [x] RemBert
  • [x] ResNet
  • [x] RegNet ๐Ÿ› ๏ธ @asrimanth
  • [ ] RetriBert
  • [x] RoFormer
  • [x] RoBERTa
  • [x] SEW
  • [x] SEW-D
  • [x] SegFormer
  • [x] Speech2Text
  • [ ] Speech2Text2
  • [x] Splinter
  • [x] SqueezeBERT
  • [x] Swin Transformer
  • [x] T5
  • [ ] TAPAS ๐Ÿ› ๏ธ @someshfengde
  • [ ] TAPEX
  • [ ] Transformer XL
  • [x] TrOCR
  • [ ] UniSpeech
  • [ ] UniSpeech-SAT
  • [ ] VAN
  • [x] ViT
  • [ ] Vilt
  • [ ] VisualBERT
  • [x] Wav2Vec2
  • [x] WavLM
  • [x] Whisper
  • [ ] XGLM
  • [x] XLM
  • [ ] XLMProphetNet
  • [x] XLM-RoBERTa
  • [x] XLM-RoBERTa-XL
  • [ ] XLNet (but ๐Ÿ› ๏ธ in Transformers)
  • [x] YOLOS
  • [ ] Yoso

๐Ÿ› ๏ธ next to a model suggests that the PR is in progress. If there is nothing next to a model, it means that ONNX does not yet support the model, and thus we need to add support for it.

If you need help implementing an unsupported model, here is a guide from HuggingFace Optimum documentation.

michaelbenayoun avatar Dec 07 '22 13:12 michaelbenayoun

Hi! I'm trying to add support for VisualBERT, which works for VQA, VCR, NLVR and RPG. Since the guide says that "When inheriting from a middle-end class, look for the one handling the same modality / category of models as the one you are trying to support.", I'm using TextAndVisionOnnxConfig because this is a multimodal model. Then initialized NORMALIZED_CONFIG_CLASS = NormalizedTextAndVisionConfig I this OK so far?

The problem comes when implementing the inputs property... What is it that this property specifies? In the guide, I see that this inputs are exactly BERT's tokenizer's output keys, and values are the tensor dimensions for each key of the tokenizer's output. This will vary task-wise so I'd have to make a different axis for each task. Is this ok?

Thanks for the help!

EDIT: I see VisualBERT is implemented separately by task, but VisualBertForPreTraining is also provided for customized down-stream tasks. Should I implement a diferent configuration for each task?

EDIT II: I see this issue was previously in the transformers repo, it seems like the docs on how to add the ONNX configuration are written in a way that ignores the current optimum implementation, I have sorted some of the difficulties that arise from this assuming one ONNX config for the whole model. Can I help with an update for this guide?

mszsorondo avatar Jan 01 '23 22:01 mszsorondo

Hi @mszsorondo , indeed the page https://huggingface.co/docs/transformers/serialization#export-to-onnx is a bit outdated. I'll do a PR to fix it. In your EDIT II, were you referring to this page?

I'd recommend to refer to: https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/contribute . If you see any issue / unclear steps in the guide, don't hesitate to open a PR!

As for VisualBERT, I guess you haven't picked the easiest one :) I think you can leave VisualBertForPreTraining aside, it's probably better to support the rest for inference.

Indeed NORMALIZED_CONFIG_CLASS = NormalizedTextAndVisionConfig seems good.

The problem comes when implementing the inputs property... What is it that this property specifies? In the guide, I see that this inputs are exactly BERT's tokenizer's output keys, and values are the tensor dimensions for each key of the tokenizer's output. This will vary task-wise so I'd have to make a different axis for each task. Is this ok?

EDIT: I see VisualBERT is implemented separately by task, but VisualBertForPreTraining is also provided for customized down-stream tasks. Should I implement a diferent configuration for each task?

I don't think you need to implement configs for each tasks. Apparently all tasks take as inputs input_ids, token_type_ids, attention_mask, visual_embeds, visual_token_type_ids, visual_attention_mask. The VisualBertForRegionToPhraseAlignment seem to have an additional region_to_phrase_position input.

To implement the input method, you need to specify which inputs / outputs the model takes, and what are the dynamic axis: for example, for CLIP, that is https://github.com/huggingface/optimum/blob/9ac17034b6cb27da23499393598086f0b3b9223d/optimum/exporters/onnx/model_configs.py#L523-L528

You can very well do an if/else in the input/output keys (or axis) depending on the task, for example BART: https://github.com/huggingface/optimum/blob/9ac17034b6cb27da23499393598086f0b3b9223d/optimum/exporters/onnx/model_configs.py#L382-L389

I think the piece where you will have the most work to do is to extend the dummy inputs generators. They are meant to generate inputs for the model, without using a preprocessor, and help to flexibly generate inputs of various shapes for example (for export validation). You would need to extend an existing one, or create a new input generator to support the visual_embeds, visual_token_type_ids, visual_attention_mask, region_to_phrase_position inputs. Unless you see an existing input generator in here you could reuse the logic of, my guess is that you can create a VisualBertDummyInputGenerator for those four inputs.

fxmarty avatar Jan 02 '23 08:01 fxmarty

Thanks for your help @fxmarty

In your EDIT II, were you referring to this page?

I was actually referring to the second guide (https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/contribute), there are some minor issues with two function calls at the export step + one lacking import. Submitted PR #662

I advanced with the inputs function and did the export step, and indeed got an error regarding visual_embeds (surely this is also a problem for visual_token_type_ids, visual_attention_mask and region_to_phrase_position as you suggest), so I'll go for the new input generator.

mszsorondo avatar Jan 02 '23 17:01 mszsorondo

Hi @michaelbenayoun!

Is someone working on adding the Pegasus ONNX config?

If not, I would like to look into it ๐Ÿ˜„(under your guidance, since I haven't done written a ONNXConfig yet)

bhavnicksm avatar Jan 03 '23 07:01 bhavnicksm

Hi @bhavnicksm , @mht-sharma just merged the Pegasus ONNX config yesterday! https://github.com/huggingface/optimum/pull/620

fxmarty avatar Jan 03 '23 08:01 fxmarty

@fxmarty Still facing an issue

Hi @bhavnicksm , @mht-sharma just merged the Pegasus ONNX config yesterday! https://github.com/huggingface/optimum/pull/620

I installed optimum directly from source here using

!pip install --quiet git+https://github.com/huggingface/optimum.git 

I tried to use Pegasus with an inference right now using ORTModelforSeq2SeqLM, using the following code:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from optimum.onnxruntime import ORTModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("tuner007/pegasus_paraphrase")
model = AutoModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase")

ort_model = ORTModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase", from_transformers=True)

and it gives me the following error:

/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:234: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:241: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:273: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-7-2e0907dfd025>](https://localhost:8080/#) in <module>
----> 1 ort_model = ORTModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase", from_transformers=True)

9 frames
[/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_ort.py](https://localhost:8080/#) in from_pretrained(cls, model_id, from_transformers, force_download, use_auth_token, cache_dir, subfolder, config, local_files_only, provider, session_options, provider_options, **kwargs)
    555             `ORTModel`: The loaded ORTModel model.
    556         """
--> 557         return super().from_pretrained(
    558             model_id,
    559             from_transformers=from_transformers,

[/usr/local/lib/python3.8/dist-packages/optimum/modeling_base.py](https://localhost:8080/#) in from_pretrained(cls, model_id, from_transformers, force_download, use_auth_token, cache_dir, subfolder, config, local_files_only, **kwargs)
    323 
    324         from_pretrained_method = cls._from_transformers if from_transformers else cls._from_pretrained
--> 325         return from_pretrained_method(
    326             model_id=model_id,
    327             config=config,

[/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_seq2seq.py](https://localhost:8080/#) in _from_transformers(cls, model_id, config, use_auth_token, revision, force_download, cache_dir, subfolder, local_files_only, use_cache, provider, session_options, provider_options, use_io_binding, task)
   1144             output_names.append(ONNX_DECODER_WITH_PAST_NAME)
   1145         models_and_onnx_configs = get_encoder_decoder_models_for_export(model, onnx_config)
-> 1146         export_models(
   1147             models_and_onnx_configs=models_and_onnx_configs,
   1148             opset=onnx_config.DEFAULT_ONNX_OPSET,

[/usr/local/lib/python3.8/dist-packages/optimum/exporters/onnx/convert.py](https://localhost:8080/#) in export_models(models_and_onnx_configs, output_dir, opset, output_names, device, input_shapes)
    534 
    535         outputs.append(
--> 536             export(
    537                 model=submodel,
    538                 config=sub_onnx_config,

[/usr/local/lib/python3.8/dist-packages/optimum/exporters/onnx/convert.py](https://localhost:8080/#) in export(model, config, output, opset, device, input_shapes)
    605                 f" got: {torch.__version__}"
    606             )
--> 607         return export_pytorch(model, config, opset, output, device=device, input_shapes=input_shapes)
    608 
    609     elif is_tf_available() and issubclass(type(model), TFPreTrainedModel):

[/usr/local/lib/python3.8/dist-packages/optimum/exporters/onnx/convert.py](https://localhost:8080/#) in export_pytorch(model, config, opset, output, device, input_shapes)
    368             # Export can work with named args but the dict containing named args has to be the last element of the args
    369             # tuple.
--> 370             onnx_export(
    371                 model,
    372                 (dummy_inputs,),

[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions)
    502     """
    503 
--> 504     _export(
    505         model,
    506         args,

[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, onnx_shape_inference, export_modules_as_functions)
   1527             _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
   1528 
-> 1529             graph, params_dict, torch_out = _model_to_graph(
   1530                 model,
   1531                 args,

[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in _model_to_graph(model, args, verbose, input_names, output_names, operator_export_type, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size, training, dynamic_axes)
   1113 
   1114     try:
-> 1115         graph = _optimize_graph(
   1116             graph,
   1117             operator_export_type,

[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module)
    662 
    663     graph = _C._jit_pass_onnx(graph, operator_export_type)
--> 664     _C._jit_pass_onnx_lint(graph)
    665     _C._jit_pass_lint(graph)
    666 

RuntimeError: Unable to cast from non-held to held instance (T& to Holder<T>) (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for type information)

bhavnicksm avatar Jan 03 '23 08:01 bhavnicksm

@bhavnicksm Can you open an issue in Optimum with your environment details? We can track it there!

fxmarty avatar Jan 03 '23 09:01 fxmarty

@fxmarty Please re-open this. ๐Ÿค—

chainyo avatar Feb 07 '23 15:02 chainyo

Thanks!

fxmarty avatar Feb 07 '23 15:02 fxmarty

I can look into ImageGPT, if it has not yet been claimed.

adit299 avatar Feb 14 '23 16:02 adit299

Feel free! Don't hesitate to ask any question if needed.

fxmarty avatar Feb 14 '23 17:02 fxmarty

Can I take TAPAS if it's not yet been claimed?

someshfengde avatar Feb 18 '23 12:02 someshfengde

Hello, Can I work on RegNet?

asrimanth avatar Feb 18 '23 21:02 asrimanth

Yes to both, feel free! I updated the list saying that you are working on it.

michaelbenayoun avatar Feb 20 '23 10:02 michaelbenayoun

Hi @michaelbenayoun, I went into the codebase recently and I think the list above may not be the latest update. I found that a few models such as

  1. PoolFormer
  2. Hubert
  3. MPnet
  4. wav2vec

already have their own configurations in this file.

hazrulakmal avatar Feb 21 '23 19:02 hazrulakmal

thank you @hazrulakmal , I updated the list!

fxmarty avatar Feb 22 '23 08:02 fxmarty

@fxmarty Re-open this, please ๐Ÿค—

chainyo avatar Mar 04 '23 10:03 chainyo

@fxmarty working on FLAVA

soma2000-lang avatar Mar 04 '23 15:03 soma2000-lang

@rcshubhadeep I moved your issue to https://github.com/huggingface/optimum/issues/968

fxmarty avatar Apr 11 '23 15:04 fxmarty

hi , is optimum supports converting Llama (alpaca-lora) to onnx ? It would be great if i get some insights in this

gjain7 avatar Apr 28 '23 06:04 gjain7

hi , is optimum supports converting Llama (alpaca-lora) to onnx ? It would be great if i get some insights in this

Yes, this is supported and was introduced in https://github.com/huggingface/optimum/pull/975. You'll need to have Optimum v1.8 to do it.

regisss avatar Apr 28 '23 07:04 regisss

The TasksManager allows to map model classes to export configuratons, here ONNX ones. Registering your ONNX config will make it possible for you to use it with the CLI and everything else.

Are you doing a PR that will be merged on optimum? If so, go to the optimum/exporters/tasks.py file and add an entry in the _SUPPORTED_MODEL_TYPE class attribute:

_SUPPORTED_MODEL_TYPE = {
    ....,
    "custom": supported_task_mapping("text-classification", ...., onnx="CustomOnnxConfig")
}

But if you are not doing a PR that will be merged in optimum, and want to dynamically register your class in your own library you can create a registering method:

register_for_onnx = TasksManager.create_register("onnx")

@register_for_onnx("model_type_here", "text-classification", ...)
class CustomOnnxConfig(TextEncoderOnnxConfig):
...

michaelbenayoun avatar May 02 '23 09:05 michaelbenayoun

If you do it programatically I do not think you need to register anything. What's your model? You put bert here, but bert is already registered for ONNX so nothing happens.

michaelbenayoun avatar May 02 '23 14:05 michaelbenayoun

Alright, could you open a PR for your issue please? We will try to help you there.

michaelbenayoun avatar May 03 '23 08:05 michaelbenayoun

Thank you for spending time on me! I think PR will be a difficult thing to do, since I am not that proficient and do not think many people will want to use my architecture anyway.

Maybe you can advice how to do it code just for my library?

base_model = CustomBertForTokenClassification.from_pretrained("my-checkpoint")

base_model.config returns BertConfig, which I think I need to overwrite with the custom config I created in the previous step...

maiiabocharova avatar May 03 '23 16:05 maiiabocharova

Sorry I meant a separete issue...

michaelbenayoun avatar May 04 '23 08:05 michaelbenayoun

Thank you a lot, I'll delete my comments here since they are unrelated to the discussion. I asked on discussion forum

maiiabocharova avatar May 04 '23 08:05 maiiabocharova

I can work on CvT, if its open

rishabbala avatar Jun 21 '23 17:06 rishabbala

Hi @rishabbala , sounds good, let us know if you encounter any help! A good reference is https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/contribute

fxmarty avatar Jun 23 '23 08:06 fxmarty

According to the above list, export of BLOOM models to ONNX is already supported, right?

Is export to ONNX already supposed to work for base models that have been finetuned with PEFT / LoRA?

Using the bigscience/bloom-560m base model and finetuning with PEFT / LoRA, I was able to perform inference after exporting to ONNX, but the model predictions are degraded ๐Ÿค” Details: https://github.com/huggingface/peft/issues/670

ingo-m avatar Jul 06 '23 15:07 ingo-m