optimum
optimum copied to clipboard
Community contribution - `BetterTransformer` integration for more models!
BetterTransformer
integration for more models!
BetterTransformer
API provides faster inference on CPU & GPU through a simple interface!
Models can benefit from very interesting speedups using a one liner and by making sure to install the latest version of PyTorch. A complete guideline on how to convert a new model has been created on the BetterTransformer documentation!
Here is a list of models that could be potentially supported, pick one of the architecture below and let's discuss about the conversion!
Text models ποΈ :
- [x] FSMT - FSMTEncoderLayer / @Sumanth077 https://github.com/huggingface/optimum/pull/494
- [ ] MobileBERT - MobileBertLayer / @raghavanone https://github.com/huggingface/optimum/pull/506
- [x] MBart - MBartEncoderLayer + M2M100EncoderLayer / https://github.com/huggingface/optimum/pull/516 @ravenouse
- [x] ProphetNet - ProphetNetEncoderLayer
- [x] RemBert - RemBertLayer
- [x] RocBert - RocBertLayer
- [x] RoFormer - RoFormerLayer
- [x] Tapas - TapasLayer / https://github.com/huggingface/optimum/pull/520
Vision models π· :
- [x] Blip - BlipLayer
- [ ] Detr - DetrLayer
- [ ] Flava - FlavaLayer
- [ ] GLPN - GLPNLayer | Cannot be supported
- [x] ViLT - ViLTLayer / https://github.com/huggingface/optimum/pull/508
Audio models π :
- [ ] Speech2Text - Speech2TextLayer
- [ ] NEW: Audio Speech Transformer - ASTLayer
Let us also know if you think that some architectures can be supported that we missed. Note that for encoder-decoder based models below, we expect to convert the encoder only.
Support for decoder-based models coming soon!
cc @michaelbenayoun @fxmarty
https://github.com/huggingface/transformers/issues/20372
Hi @younesbelkada would love to contribute to this Issue and can work on FSMT.
Hey @Sumanth077 , thanks a bunch for your interest in this issue! π Would love to assist you for the integration and let's try to make this happen! I have updated the table above, and attaching you the contribution tutorial here ;) Would you mind forking this repo and start opening a draft pull request so that I can start guiding you there? Also please do not hesitate to ping us here for any issue you are facing for the integration πͺ
Thankyou for the reply @younesbelkada. Just opened a Draft Pull Request, haven't made any significant changes.
In the Step 1: Identifying the source layer to change and in the BETTER_TRANFORMER_LAYERS_MAPPING_DICT, I couldn't find a mapping between the Module for the FSMT that can be converted to its BetterTransformer equivalent.
Should I start creating that. Would love your assistance
Hi @Sumanth077 , I have just replied on your PR, let's continue the discussion there ;)
Hi, I would like to contribute as well. This would be my first contribution to open source, so I might need some hand holding π€
I followed the documentation and the progress made on FSMT in huggingface/optimum#494 to better understand the task.
I looked into ViLT via
model = AutoModel.from_pretrained("dandelin/vilt-b32-mlm")
and as I understand the documentation, this should be the source layer to make changes to, including its attributes:
(0): ViltLayer( (attention): ViltAttention( (attention): ViltSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (output): ViltSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) ) (intermediate): ViltIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): ViltOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (layernorm_before): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (layernorm_after): LayerNorm((768,), eps=1e-12, elementwise_affine=True) )
I could give the ViLTLayer a go, if it's ok with you @younesbelkada π
Hi @ka00ri !
Thanks a lot for your message and interest in contributing! Would love to assist you for integrating ViLT
into BetterTransformer
πͺ
That is correct, this layer has to be the source layer to change!
Would you mind opening a PR and tag us (myself, @michaelbenayoun & @fxmarty ) ? Thanks a bunch!
Hello, apologies for the delay, but I just opened up a draft PR to start discussion on how to add Better Transformer support for the ProphetNet encoder layer. I had a couple of questions about how to do this, so I was wondering who would would be the best person to reach out to regarding this. @michaelbenayoun @fxmarty @younesbelkada
Hi @adit299 , thanks for adding the support for this architecture! Feel free to ask any question in the PR you opened.
Hi @younesbelkada, could I pick up the RoFormer?
@younesbelkada doing Detr - DetrLayer
Hello @JanFidor Yes sure! @soma2000-lang perfect, let us know when you open a PR πͺ !
@younesbelkada Hi, thanks for responding, I'm not 100% certain, but I think RemBert, RoFormer and RocBert are already implemented, as they're already added to init.py, overview.mdx and the test_file, if that's the case, the list of models left to implement would need to be updated, let me know if you agree!
I see, thanks for clarifying. I will double check that and let you know
Thanks for letting me know! Indeed these are already implemented
I can propose you to add BetterTransformer support for Blip
(updated the table above)
Thanks for the suggestion, I'll get on it!
Hi @fxmarty and @younesbelkada !
Thank you so much for your previous help and support on my implementation of MBart
support for BetterTransformer
.
I want to follow up on my PR on ASTLayer
support for BetterTransformer
.
Specifically, I would like to check with you if it is still possible to work on this and have it reviewed and merged into the package. If it is, I would be happy to continue working on it.
I realized the whole BetterTransformer
part and its testing have changed a lot in last several months. Once I get confirmed, I will start to edit my code accordingly to meet previous changes.
Thank you so much for your time and help, and I look forward to hearing back from you soon.
Sincerely,
@younesbelkada I would like to work upon flavalayer can you confirm whether it is done or not?
Hi! @JanFidor will you finish with BLIP? I can do it if not, with the permission of @younesbelkada @fxmarty
@younesbelkada I would like to work upon flavalayer can you confirm whether it is done or not?
Hi,
@mszsorondo Looking into the PRs, BLIP has been implemented in https://github.com/huggingface/optimum/pull/1125. I just ticked it in the first post. @rajveer43 For Flava, there is this onging PR: https://github.com/huggingface/optimum/pull/907
@fxmarty any other model available for work?
@fxmarty same here, if thereΒ΄s still any model
@younesbelkada Can, I work on ASTLayer??
Any plans to add support for MPT?
please support florence2!!!