optimum Community contribution - `BetterTransformer` integration for more models!

`BetterTransformer` integration for more models!

BetterTransformer API provides faster inference on CPU & GPU through a simple interface!

Models can benefit from very interesting speedups using a one liner and by making sure to install the latest version of PyTorch. A complete guideline on how to convert a new model has been created on the BetterTransformer documentation!

Here is a list of models that could be potentially supported, pick one of the architecture below and let's discuss about the conversion!

Text models 🖊️ :

[x] FSMT - FSMTEncoderLayer / @Sumanth077 https://github.com/huggingface/optimum/pull/494
[ ] MobileBERT - MobileBertLayer / @raghavanone https://github.com/huggingface/optimum/pull/506
[x] MBart - MBartEncoderLayer + M2M100EncoderLayer / https://github.com/huggingface/optimum/pull/516 @ravenouse
[x] ProphetNet - ProphetNetEncoderLayer
[x] RemBert - RemBertLayer
[x] RocBert - RocBertLayer
[x] RoFormer - RoFormerLayer
[x] Tapas - TapasLayer / https://github.com/huggingface/optimum/pull/520

Vision models 📷 :

[x] Blip - BlipLayer
[ ] Detr - DetrLayer
[ ] Flava - FlavaLayer
[ ] GLPN - GLPNLayer | Cannot be supported
[x] ViLT - ViLTLayer / https://github.com/huggingface/optimum/pull/508

Audio models 🔉 :

[ ] Speech2Text - Speech2TextLayer
[ ] NEW: Audio Speech Transformer - ASTLayer

Let us also know if you think that some architectures can be supported that we missed. Note that for encoder-decoder based models below, we expect to convert the encoder only.

Support for decoder-based models coming soon!

cc @michaelbenayoun @fxmarty

https://github.com/huggingface/transformers/issues/20372

Nov 18 '22 10:11 younesbelkada

Hi @younesbelkada would love to contribute to this Issue and can work on FSMT.

Nov 19 '22 20:11 Sumanth077

Hey @Sumanth077 , thanks a bunch for your interest in this issue! 🚀 Would love to assist you for the integration and let's try to make this happen! I have updated the table above, and attaching you the contribution tutorial here ;) Would you mind forking this repo and start opening a draft pull request so that I can start guiding you there? Also please do not hesitate to ping us here for any issue you are facing for the integration 💪

Nov 19 '22 20:11 younesbelkada

Thankyou for the reply @younesbelkada. Just opened a Draft Pull Request, haven't made any significant changes.

In the Step 1: Identifying the source layer to change and in the BETTER_TRANFORMER_LAYERS_MAPPING_DICT, I couldn't find a mapping between the Module for the FSMT that can be converted to its BetterTransformer equivalent.

Should I start creating that. Would love your assistance

Nov 20 '22 11:11 Sumanth077

Hi @Sumanth077 , I have just replied on your PR, let's continue the discussion there ;)

Nov 21 '22 18:11 younesbelkada

Hi, I would like to contribute as well. This would be my first contribution to open source, so I might need some hand holding 🤚

I followed the documentation and the progress made on FSMT in huggingface/optimum#494 to better understand the task.

I looked into ViLT via

model = AutoModel.from_pretrained("dandelin/vilt-b32-mlm")

and as I understand the documentation, this should be the source layer to make changes to, including its attributes:

(0): ViltLayer( (attention): ViltAttention( (attention): ViltSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (output): ViltSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) ) (intermediate): ViltIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): ViltOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (layernorm_before): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (layernorm_after): LayerNorm((768,), eps=1e-12, elementwise_affine=True) )

I could give the ViLTLayer a go, if it's ok with you @younesbelkada 🙂

Nov 22 '22 23:11 ka00ri

Hi @ka00ri ! Thanks a lot for your message and interest in contributing! Would love to assist you for integrating ViLT into BetterTransformer 💪 That is correct, this layer has to be the source layer to change! Would you mind opening a PR and tag us (myself, @michaelbenayoun & @fxmarty ) ? Thanks a bunch!

Nov 23 '22 09:11 younesbelkada

Hello, apologies for the delay, but I just opened up a draft PR to start discussion on how to add Better Transformer support for the ProphetNet encoder layer. I had a couple of questions about how to do this, so I was wondering who would would be the best person to reach out to regarding this. @michaelbenayoun @fxmarty @younesbelkada

Dec 27 '22 22:12 adit299

Hi @adit299 , thanks for adding the support for this architecture! Feel free to ask any question in the PR you opened.

Dec 29 '22 15:12 fxmarty

Hi @younesbelkada, could I pick up the RoFormer?

Feb 15 '23 15:02 JanFidor

@younesbelkada doing Detr - DetrLayer

Mar 04 '23 15:03 soma2000-lang

Hello @JanFidor Yes sure! @soma2000-lang perfect, let us know when you open a PR 💪 !

Mar 06 '23 08:03 younesbelkada

@younesbelkada Hi, thanks for responding, I'm not 100% certain, but I think RemBert, RoFormer and RocBert are already implemented, as they're already added to init.py, overview.mdx and the test_file, if that's the case, the list of models left to implement would need to be updated, let me know if you agree!

Mar 06 '23 12:03 JanFidor

I see, thanks for clarifying. I will double check that and let you know

Mar 06 '23 13:03 younesbelkada

Thanks for letting me know! Indeed these are already implemented I can propose you to add BetterTransformer support for Blip (updated the table above)

Mar 06 '23 13:03 younesbelkada

Thanks for the suggestion, I'll get on it!

Mar 07 '23 17:03 JanFidor

Hi @fxmarty and @younesbelkada !

Thank you so much for your previous help and support on my implementation of MBart support for BetterTransformer.

I want to follow up on my PR on ASTLayer support for BetterTransformer.

Specifically, I would like to check with you if it is still possible to work on this and have it reviewed and merged into the package. If it is, I would be happy to continue working on it.

I realized the whole BetterTransformer part and its testing have changed a lot in last several months. Once I get confirmed, I will start to edit my code accordingly to meet previous changes.

Thank you so much for your time and help, and I look forward to hearing back from you soon.

Sincerely,

Mar 29 '23 03:03 ravenouse

@younesbelkada I would like to work upon flavalayer can you confirm whether it is done or not?

Aug 04 '23 13:08 rajveer43

Hi! @JanFidor will you finish with BLIP? I can do it if not, with the permission of @younesbelkada @fxmarty

Aug 07 '23 21:08 mszsorondo

@younesbelkada I would like to work upon flavalayer can you confirm whether it is done or not?

Aug 08 '23 13:08 rajveer43

Hi,

@mszsorondo Looking into the PRs, BLIP has been implemented in https://github.com/huggingface/optimum/pull/1125. I just ticked it in the first post. @rajveer43 For Flava, there is this onging PR: https://github.com/huggingface/optimum/pull/907

Aug 11 '23 08:08 fxmarty

@fxmarty any other model available for work?

Aug 11 '23 09:08 rajveer43

@fxmarty same here, if there´s still any model

Aug 11 '23 14:08 mszsorondo

@younesbelkada Can, I work on ASTLayer??

Nov 20 '23 11:11 hackpk

Any plans to add support for MPT?

Dec 22 '23 04:12 karandua2016

please support florence2!!!

Jul 23 '24 07:07 qingfengcss

optimum optimum copied to clipboard

Community contribution - `BetterTransformer` integration for more models!

BetterTransformer integration for more models!

optimum
optimum copied to clipboard

`BetterTransformer` integration for more models!