transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add Beit3 model

Open raghavanone opened this issue 1 year ago • 37 comments

Fixes #22178

raghavanone avatar Mar 21 '23 10:03 raghavanone

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

I don't know much about the details of the transformers library but isn't it confusing that it refers to the model as microsoft/beit-base-patch16-224-pt22k etc which is the name for beit v1, not beit v3?

MetaB0y avatar Mar 22 '23 19:03 MetaB0y

Hi @MetaB0y , All I have done is to pull in different modules needed for beit3 into single file. I will start working on cleaning it up.

raghavanone avatar Mar 25 '23 06:03 raghavanone

Hi @raghavanone, just wanted to know updates on this PR. If required, I would like to help.

atharvakavitkar avatar Apr 05 '23 09:04 atharvakavitkar

Hi @raghavanone, just wanted to know updates on this PR. If required, I would like to help.

@atharvakavitkar You can for sure contribute, I am out till mid of april, will pick this up if not done by anyone else by that time . Start a new PR, if you wanted to work on this .

raghavanone avatar Apr 05 '23 15:04 raghavanone

Hey @raghavanone , I was thinking about working on this this weekend and see that you have a WIP already and are probably back now. Would it help if I contributed on this PR or better to leave it to you?

JonathanRayner avatar Apr 21 '23 11:04 JonathanRayner

Hey @raghavanone , I was thinking about working on this this weekend and see that you have a WIP already and are probably back now. Would it help if I contributed on this PR or better to leave it to you?

Hey @JonathanRayner , Thanks for asking, I have some changes locally, making changes to MOE layers are bit tricky. We can collaborate in this PR, But we have to find a way to communicate (use HF slack maybe) .

raghavanone avatar Apr 21 '23 14:04 raghavanone

cc @alaradirik so you can help when needed.

sgugger avatar Apr 24 '23 14:04 sgugger

Hey @raghavanone , I was thinking about working on this this weekend and see that you have a WIP already and are probably back now. Would it help if I contributed on this PR or better to leave it to you?

Hey @JonathanRayner , Thanks for asking, I have some changes locally, making changes to MOE layers are bit tricky. We can collaborate in this PR, But we have to find a way to communicate (use HF slack maybe) .

Hi @raghavanone, thanks for working on this! The easiest way to collaborate would be to add @JonathanRayner as a collaborator to your forked transformers repo. I'd also be happy to create a Slack channel and invite you both to make it easier to communicate.

I took a quick look at the PR and it'd be great if you could follow naming conventions in line with Beit such that the class and folder names are Beit3 and beit3 respectively. Other than this, BEiT-3 is a multi-modal model so you'll need to create image_processing_beit3.py, tokenizer_beit3.py and processing_beit3.py scripts, where the latter wraps the text and image preprocessor classes into a single class. BEiT-3 uses the transformers XLMRobertaTokenizer class to preprocess text so you can use that instead of creating tokenizer_beit3.py. You can refer to OWL-ViT to see an example of a multi-modal model that uses an existing tokenizer class (CLIPTokenizer).

In order to check if the PR passes the CI tests, you can run the following commands:

make style
make quality
make repo-consistency

pytest tests/models/beit3/test_image_processor_beit3.py
pytest tests/models/beit3/test_processor_beit3.py
RUN_SLOW=True pytest tests/models/beit3/test_modeling_beit3.py

Hope this helps, I can add invite you to Slack if you send me your email addresses :)

alaradirik avatar Apr 30 '23 16:04 alaradirik

Hey @raghavanone , I was thinking about working on this this weekend and see that you have a WIP already and are probably back now. Would it help if I contributed on this PR or better to leave it to you?

Hey @JonathanRayner , Thanks for asking, I have some changes locally, making changes to MOE layers are bit tricky. We can collaborate in this PR, But we have to find a way to communicate (use HF slack maybe) .

Hi @raghavanone, thanks for working on this! The easiest way to collaborate would be to add @JonathanRayner as a collaborator to your forked transformers repo. I'd also be happy to create a Slack channel and invite you both to make it easier to communicate.

I took a quick look at the PR and it'd be great if you could follow naming conventions in line with Beit such that the class and folder names are Beit3 and beit3 respectively. Other than this, BEiT-3 is a multi-modal model so you'll need to create image_processing_beit3.py, tokenizer_beit3.py and processing_beit3.py scripts, where the latter wraps the text and image preprocessor classes into a single class. BEiT-3 uses the transformers XLMRobertaTokenizer class to preprocess text so you can use that instead of creating tokenizer_beit3.py. You can refer to OWL-ViT to see an example of a multi-modal model that uses an existing tokenizer class (CLIPTokenizer).

In order to check if the PR passes the CI tests, you can run the following commands:

make style
make quality
make repo-consistency

pytest tests/models/beit3/test_image_processor_beit3.py
pytest tests/models/beit3/test_processor_beit3.py
RUN_SLOW=True pytest tests/models/beit3/test_modeling_beit3.py

Hope this helps, I can add invite you to Slack if you send me your email addresses :)

Thanks @alaradirik , The PR is almost ready. I am already in HF slack under email [email protected] / username Raghavan . I have some questions, please ping me in slack.

raghavanone avatar May 01 '23 13:05 raghavanone

@alaradirik I am unable to understand the purpose of a processor, the Beit3 model takes in token ids and image as tensor. Please help me understand this more.

raghavanone avatar May 02 '23 13:05 raghavanone

@alaradirik I am unable to understand the purpose of a processor, the Beit3 model takes in token ids and image as tensor. Please help me understand this more.

Hi @raghavanone, the models expect the input images to be preprocessed. Hence, the image_processing_beit3.py script should contain the Beit3ImageProcessor class that takes in the raw input image and preprocesses it to the format expected as input to the model (resizing to a fixed input size, normalization, cropping, etc.).

processing_beit3.py script should contain the Beit3Processor class that wraps this image processor class and tokenizer class into a single instance such that users can use it preprocess text or images or both. Please take a look at other multi-modal model processors such as OWL-ViT and CLIP to see how that works.

alaradirik avatar May 04 '23 09:05 alaradirik

@alaradirik I am unable to understand the purpose of a processor, the Beit3 model takes in token ids and image as tensor. Please help me understand this more.

Hi @raghavanone, the models expect the input images to be preprocessed. Hence, the image_processing_beit3.py script should contain the Beit3ImageProcessor class that takes in the raw input image and preprocesses it to the format expected as input to the model (resizing to a fixed input size, normalization, cropping, etc.).

processing_beit3.py script should contain the Beit3Processor class that wraps this image processor class and tokenizer class into a single instance such that users can use it preprocess text or images or both. Please take a look at other multi-modal model processors such as OWL-ViT and CLIP to see how that works.

@alaradirik Thanks, I have added both the class and added tests for them . Requesting you to review the PR.

raghavanone avatar May 05 '23 09:05 raghavanone

@alaradirik All the PR feedbacks has been resolved. There are few open and I have put my questions in the comment. On the conversion script I have following questions :

  • There are 22 variations of model checkpoints released, should I test out for each of them ?
  • How to upload the checkpoints to hf ?

raghavanone avatar May 18 '23 10:05 raghavanone

@NielsRogge Following are the open questions to be resolved :

Q1. How should be the config uploaded ? Q2. How should be the checkpoints uploaded ? Q3. Comment from Alara : "Passing a module to the class is not very optimal. I see that you are initializing and passing various modules in Beit3EncoderLayer and MultiheadAttention to MultiwayNetwork and creating deep copies.

I think it'd make more sense to create separate classes (e.g. Beit3Dense, Beit3FeedForwardNetwork) as variable names such as first and second are confusing and make the code more difficult to be adapted for works that build upon this.

I'm cc'ing @sgugger for his opinion."

For reference look at here https://github.com/huggingface/transformers/blob/1cda50bd12d7d454f56fbdd9f8fe32aee1eae5b3/src/transformers/models/beit3/modeling_beit3.py#L382

raghavanone avatar May 30 '23 14:05 raghavanone

cc @amyeroberts

sgugger avatar May 30 '23 14:05 sgugger

Hi @raghavanone,

I'll try to answer your questions as best as possible. For future Q's it's best to ping me rather than @NielsRogge or @sgugger.

  1. When you say 'uploaded' - are you referring to uploading onto the hub e.g. like this for bert? If so, this should be uploaded alongside the model weights. When calling model.push_to_hub(repo_path), both the model's checkpoint and configuration will be uploaded. You can look at some of the conversion scripts to see the weight loading / converting / uploading logic e.g. here. Whilst the PR is still underdevelopment, I suggest having the models under a personal repo. Then, once ready to merge, we can transfer the weights and configs to under the official orgs'.
  2. See above.
  3. What's the question? Are you asking what @alaradirik's comment means, or asking whether this is something that should be done?

amyeroberts avatar May 30 '23 17:05 amyeroberts

Hi @raghavanone,

I'll try to answer your questions as best as possible. For future Q's it's best to ping me rather than @NielsRogge or @sgugger.

  1. When you say 'uploaded' - are you referring to uploading onto the hub e.g. like this for bert? If so, this should be uploaded alongside the model weights. When calling model.push_to_hub(repo_path), both the model's checkpoint and configuration will be uploaded. You can look at some of the conversion scripts to see the weight loading / converting / uploading logic e.g. here. Whilst the PR is still underdevelopment, I suggest having the models under a personal repo. Then, once ready to merge, we can transfer the weights and configs to under the official orgs'.
  2. See above.
  3. What's the question? Are you asking what @alaradirik's comment means, or asking whether this is something that should be done?

Thanks for the answers, for Q3 , I am not sure how to incorporate the feedback, I need some support on what needs to be done .

raghavanone avatar Jun 01 '23 14:06 raghavanone

@raghavanone For 3. what I believe Alara was saying (and I agree with) is that the layer Beit3MultiwayNetwork is trying to do too much, resulting in patterns which don't match with the rest of the library and is less interpretable. We should instead implement individual blocks which avoids hacky tricks like copying then reseting layer parameters.

To be explicit, an example would be for self.self_attn_layer_norm = Beit3MultiwayNetwork(LayerNorm(self.embed_dim, eps=config.layernorm_eps)) on L491. Instead of using Beit3MultiwayNetwork we could instead define a model specific layernorm layer:

class Beit3LayerNorm(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.layernorm_1 = nn.LayerNorm(config.embed_dim, eps=self.config.layernorm_eps)
        self.layernorm_2 = nn.LayerNorm(config.embed_dim, eps=self.config.layernorm_eps)

    def forward(self, hidden_states, split_position=-1):
        if split_position == -1:
            return self.layernorm_1(hidden_states)
        
        if split_position == 0:
            return self.layernorm_2(hidden_states)
        
        text_hidden, image_hidden = torch.split(
            hidden_states, [split_position, hidden_states.size(1) - split_position], dim=1,
        )
        text_hidden = self.layernorm_1(text_hidden)
        image_hidden = self.layernorm_2(image_hidden)
        hidden_states = torch.cat([text_hidden, image_hidden], dim=1)
        return hidden_states

And then L491 would become:

self.self_attn_layer_norm = Beit3LayerNorm(config)

It's OK for us to have some of the if split_position logic repeated if it means a clearer architecture and having layers take the config to instantiate themselves.

A note regarding the general design of the layer above:

  • The set_split_position is very hacky and requires iterating over all of the layer of the model each time we do a forward pass with multiway_split_position set. Instead, lets pass this to the layers in the forward pass
  • The layers shouldn't accept arbitary *args or **kwargs in their methods
  • AFAICT dim was never changed or set, so we can remove this attribute.

amyeroberts avatar Jun 01 '23 19:06 amyeroberts

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jun 26 '23 15:06 github-actions[bot]

@amyeroberts All the feedbacks have been incorporated.

raghavanone avatar Aug 31 '23 07:08 raghavanone

@raghavanone I'll be off for a few weeks from next week. If you need another review in that time please ask @rafaelpadilla. Once he's approved and all tests passing, then we can ask for a core maintainer review.

amyeroberts avatar Sep 12 '23 10:09 amyeroberts

@rafaelpadilla All the PR feedbacks has been taken in. Request you to do a review .

raghavanone avatar Sep 18 '23 13:09 raghavanone

@rafaelpadilla All the PR feedbacks has been resolved.

raghavanone avatar Sep 27 '23 15:09 raghavanone

@ArthurZucker I have questions for some of the comments, request for more clarification .

raghavanone avatar Oct 06 '23 06:10 raghavanone

@NielsRogge @ArthurZucker All the feedbacks has been incorporated.

raghavanone avatar Oct 14 '23 03:10 raghavanone

Thanks, I'm currently checking out your branch, will open a PR on your fork of things I'd like to see updated

NielsRogge avatar Oct 16 '23 15:10 NielsRogge

Hi @raghavanone I went over your PR, looks great already, however there are still various things which need to be addressed, for which I opened a PR here: https://github.com/raghavanone/transformers/pull/1.

NielsRogge avatar Oct 17 '23 06:10 NielsRogge

@amyeroberts All the comments have been addressed, The failing test are unrelated to this PR. Let me know if I need to anything to fix them.

raghavanone avatar Nov 10 '23 08:11 raghavanone

Gently pinging @amyeroberts for approving this PR

NielsRogge avatar Nov 17 '23 07:11 NielsRogge