transformers
transformers copied to clipboard
Add Beit3 model
Fixes #22178
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
I don't know much about the details of the transformers library but isn't it confusing that it refers to the model as microsoft/beit-base-patch16-224-pt22k etc which is the name for beit v1, not beit v3?
Hi @MetaB0y , All I have done is to pull in different modules needed for beit3 into single file. I will start working on cleaning it up.
Hi @raghavanone, just wanted to know updates on this PR. If required, I would like to help.
Hi @raghavanone, just wanted to know updates on this PR. If required, I would like to help.
@atharvakavitkar You can for sure contribute, I am out till mid of april, will pick this up if not done by anyone else by that time . Start a new PR, if you wanted to work on this .
Hey @raghavanone , I was thinking about working on this this weekend and see that you have a WIP already and are probably back now. Would it help if I contributed on this PR or better to leave it to you?
Hey @raghavanone , I was thinking about working on this this weekend and see that you have a WIP already and are probably back now. Would it help if I contributed on this PR or better to leave it to you?
Hey @JonathanRayner , Thanks for asking, I have some changes locally, making changes to MOE layers are bit tricky. We can collaborate in this PR, But we have to find a way to communicate (use HF slack maybe) .
cc @alaradirik so you can help when needed.
Hey @raghavanone , I was thinking about working on this this weekend and see that you have a WIP already and are probably back now. Would it help if I contributed on this PR or better to leave it to you?
Hey @JonathanRayner , Thanks for asking, I have some changes locally, making changes to MOE layers are bit tricky. We can collaborate in this PR, But we have to find a way to communicate (use HF slack maybe) .
Hi @raghavanone, thanks for working on this! The easiest way to collaborate would be to add @JonathanRayner as a collaborator to your forked transformers repo. I'd also be happy to create a Slack channel and invite you both to make it easier to communicate.
I took a quick look at the PR and it'd be great if you could follow naming conventions in line with Beit such that the class and folder names are Beit3 and beit3 respectively. Other than this, BEiT-3 is a multi-modal model so you'll need to create image_processing_beit3.py
, tokenizer_beit3.py
and processing_beit3.py
scripts, where the latter wraps the text and image preprocessor classes into a single class. BEiT-3 uses the transformers XLMRobertaTokenizer
class to preprocess text so you can use that instead of creating tokenizer_beit3.py
. You can refer to OWL-ViT to see an example of a multi-modal model that uses an existing tokenizer class (CLIPTokenizer
).
In order to check if the PR passes the CI tests, you can run the following commands:
make style
make quality
make repo-consistency
pytest tests/models/beit3/test_image_processor_beit3.py
pytest tests/models/beit3/test_processor_beit3.py
RUN_SLOW=True pytest tests/models/beit3/test_modeling_beit3.py
Hope this helps, I can add invite you to Slack if you send me your email addresses :)
Hey @raghavanone , I was thinking about working on this this weekend and see that you have a WIP already and are probably back now. Would it help if I contributed on this PR or better to leave it to you?
Hey @JonathanRayner , Thanks for asking, I have some changes locally, making changes to MOE layers are bit tricky. We can collaborate in this PR, But we have to find a way to communicate (use HF slack maybe) .
Hi @raghavanone, thanks for working on this! The easiest way to collaborate would be to add @JonathanRayner as a collaborator to your forked transformers repo. I'd also be happy to create a Slack channel and invite you both to make it easier to communicate.
I took a quick look at the PR and it'd be great if you could follow naming conventions in line with Beit such that the class and folder names are Beit3 and beit3 respectively. Other than this, BEiT-3 is a multi-modal model so you'll need to create
image_processing_beit3.py
,tokenizer_beit3.py
andprocessing_beit3.py
scripts, where the latter wraps the text and image preprocessor classes into a single class. BEiT-3 uses the transformersXLMRobertaTokenizer
class to preprocess text so you can use that instead of creatingtokenizer_beit3.py
. You can refer to OWL-ViT to see an example of a multi-modal model that uses an existing tokenizer class (CLIPTokenizer
).In order to check if the PR passes the CI tests, you can run the following commands:
make style make quality make repo-consistency pytest tests/models/beit3/test_image_processor_beit3.py pytest tests/models/beit3/test_processor_beit3.py RUN_SLOW=True pytest tests/models/beit3/test_modeling_beit3.py
Hope this helps, I can add invite you to Slack if you send me your email addresses :)
Thanks @alaradirik , The PR is almost ready. I am already in HF slack under email [email protected] / username Raghavan . I have some questions, please ping me in slack.
@alaradirik I am unable to understand the purpose of a processor, the Beit3 model takes in token ids and image as tensor. Please help me understand this more.
@alaradirik I am unable to understand the purpose of a processor, the Beit3 model takes in token ids and image as tensor. Please help me understand this more.
Hi @raghavanone, the models expect the input images to be preprocessed. Hence, the image_processing_beit3.py script should contain the Beit3ImageProcessor
class that takes in the raw input image and preprocesses it to the format expected as input to the model (resizing to a fixed input size, normalization, cropping, etc.).
processing_beit3.py script should contain the Beit3Processor
class that wraps this image processor class and tokenizer class into a single instance such that users can use it preprocess text or images or both. Please take a look at other multi-modal model processors such as OWL-ViT and CLIP to see how that works.
@alaradirik I am unable to understand the purpose of a processor, the Beit3 model takes in token ids and image as tensor. Please help me understand this more.
Hi @raghavanone, the models expect the input images to be preprocessed. Hence, the image_processing_beit3.py script should contain the
Beit3ImageProcessor
class that takes in the raw input image and preprocesses it to the format expected as input to the model (resizing to a fixed input size, normalization, cropping, etc.).processing_beit3.py script should contain the
Beit3Processor
class that wraps this image processor class and tokenizer class into a single instance such that users can use it preprocess text or images or both. Please take a look at other multi-modal model processors such as OWL-ViT and CLIP to see how that works.
@alaradirik Thanks, I have added both the class and added tests for them . Requesting you to review the PR.
@alaradirik All the PR feedbacks has been resolved. There are few open and I have put my questions in the comment. On the conversion script I have following questions :
- There are 22 variations of model checkpoints released, should I test out for each of them ?
- How to upload the checkpoints to hf ?
@NielsRogge Following are the open questions to be resolved :
Q1. How should be the config uploaded ? Q2. How should be the checkpoints uploaded ? Q3. Comment from Alara : "Passing a module to the class is not very optimal. I see that you are initializing and passing various modules in Beit3EncoderLayer and MultiheadAttention to MultiwayNetwork and creating deep copies.
I think it'd make more sense to create separate classes (e.g. Beit3Dense, Beit3FeedForwardNetwork) as variable names such as first and second are confusing and make the code more difficult to be adapted for works that build upon this.
I'm cc'ing @sgugger for his opinion."
For reference look at here https://github.com/huggingface/transformers/blob/1cda50bd12d7d454f56fbdd9f8fe32aee1eae5b3/src/transformers/models/beit3/modeling_beit3.py#L382
cc @amyeroberts
Hi @raghavanone,
I'll try to answer your questions as best as possible. For future Q's it's best to ping me rather than @NielsRogge or @sgugger.
- When you say 'uploaded' - are you referring to uploading onto the hub e.g. like this for bert? If so, this should be uploaded alongside the model weights. When calling
model.push_to_hub(repo_path)
, both the model's checkpoint and configuration will be uploaded. You can look at some of the conversion scripts to see the weight loading / converting / uploading logic e.g. here. Whilst the PR is still underdevelopment, I suggest having the models under a personal repo. Then, once ready to merge, we can transfer the weights and configs to under the official orgs'. - See above.
- What's the question? Are you asking what @alaradirik's comment means, or asking whether this is something that should be done?
Hi @raghavanone,
I'll try to answer your questions as best as possible. For future Q's it's best to ping me rather than @NielsRogge or @sgugger.
- When you say 'uploaded' - are you referring to uploading onto the hub e.g. like this for bert? If so, this should be uploaded alongside the model weights. When calling
model.push_to_hub(repo_path)
, both the model's checkpoint and configuration will be uploaded. You can look at some of the conversion scripts to see the weight loading / converting / uploading logic e.g. here. Whilst the PR is still underdevelopment, I suggest having the models under a personal repo. Then, once ready to merge, we can transfer the weights and configs to under the official orgs'.- See above.
- What's the question? Are you asking what @alaradirik's comment means, or asking whether this is something that should be done?
Thanks for the answers, for Q3 , I am not sure how to incorporate the feedback, I need some support on what needs to be done .
@raghavanone For 3. what I believe Alara was saying (and I agree with) is that the layer Beit3MultiwayNetwork
is trying to do too much, resulting in patterns which don't match with the rest of the library and is less interpretable. We should instead implement individual blocks which avoids hacky tricks like copying then reseting layer parameters.
To be explicit, an example would be for self.self_attn_layer_norm = Beit3MultiwayNetwork(LayerNorm(self.embed_dim, eps=config.layernorm_eps))
on L491. Instead of using Beit3MultiwayNetwork
we could instead define a model specific layernorm layer:
class Beit3LayerNorm(nn.Module):
def __init__(self, config):
super().__init__()
self.layernorm_1 = nn.LayerNorm(config.embed_dim, eps=self.config.layernorm_eps)
self.layernorm_2 = nn.LayerNorm(config.embed_dim, eps=self.config.layernorm_eps)
def forward(self, hidden_states, split_position=-1):
if split_position == -1:
return self.layernorm_1(hidden_states)
if split_position == 0:
return self.layernorm_2(hidden_states)
text_hidden, image_hidden = torch.split(
hidden_states, [split_position, hidden_states.size(1) - split_position], dim=1,
)
text_hidden = self.layernorm_1(text_hidden)
image_hidden = self.layernorm_2(image_hidden)
hidden_states = torch.cat([text_hidden, image_hidden], dim=1)
return hidden_states
And then L491 would become:
self.self_attn_layer_norm = Beit3LayerNorm(config)
It's OK for us to have some of the if split_position
logic repeated if it means a clearer architecture and having layers take the config to instantiate themselves.
A note regarding the general design of the layer above:
- The
set_split_position
is very hacky and requires iterating over all of the layer of the model each time we do a forward pass withmultiway_split_position
set. Instead, lets pass this to the layers in the forward pass - The layers shouldn't accept arbitary *args or **kwargs in their methods
- AFAICT
dim
was never changed or set, so we can remove this attribute.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@amyeroberts All the feedbacks have been incorporated.
@raghavanone I'll be off for a few weeks from next week. If you need another review in that time please ask @rafaelpadilla. Once he's approved and all tests passing, then we can ask for a core maintainer review.
@rafaelpadilla All the PR feedbacks has been taken in. Request you to do a review .
@rafaelpadilla All the PR feedbacks has been resolved.
@ArthurZucker I have questions for some of the comments, request for more clarification .
@NielsRogge @ArthurZucker All the feedbacks has been incorporated.
Thanks, I'm currently checking out your branch, will open a PR on your fork of things I'd like to see updated
Hi @raghavanone I went over your PR, looks great already, however there are still various things which need to be addressed, for which I opened a PR here: https://github.com/raghavanone/transformers/pull/1.
@amyeroberts All the comments have been addressed, The failing test are unrelated to this PR. Let me know if I need to anything to fix them.
Gently pinging @amyeroberts for approving this PR