transformers icon indicating copy to clipboard operation
transformers copied to clipboard

add Unified-IO

Open thedarkzeno opened this issue 1 year ago • 7 comments

Model description

I'd like to request the addition of the Unified-IO model. It is a multimodal model capable of visual question answering, image generation and more... the repo is this: https://github.com/allenai/unified-io-inference the paper: Unified-IO: Sequential Modeling for Generally Applicable Vision Models

Open source status

  • [X] The model implementation is available
  • [X] The model weights are available

Provide useful links for the implementation

https://github.com/allenai/unified-io-inference

thedarkzeno avatar Sep 17 '22 02:09 thedarkzeno

Hi, have you started working on the issue? Do you plan to integrate it yourself?

marinone94 avatar Sep 30 '22 10:09 marinone94

I'd like to work on this issue, is there any documentation on adding new models that I should follow?

alceballosa avatar Oct 20 '22 15:10 alceballosa

I would like to work on this one.

ChanBong avatar Jan 22 '23 18:01 ChanBong

@NielsRogge @alaradirik If no one else is currently working on adding this model, I would like to work on it.

kumar-devesh avatar Mar 07 '23 10:03 kumar-devesh

Hi @kumar-devesh , I'm working on it (made some progress toward getting a working version of the Discrete VAE in Torch) but @osanseviero told me that it would be better to verify if there's interest from the development team. If they're ok with it then we could work on it together.

alceballosa avatar Mar 07 '23 11:03 alceballosa

cc @sgugger @amyeroberts

osanseviero avatar Mar 07 '23 19:03 osanseviero

Hi @ChanBong @kumar-devesh @alceballosa, Unified-IO would be a great addition to the library.

If you are not familiar with contributing to transformers, you can refer to the guidelines to get started. I'd recommend checking if you can run the original repo without any issues and get the expected results first.

Here are some summarised points that might help with model addition:

  • Each model, including different checkpoints of the same model, has it's own repo on the Hub (see DETR-ResNet-50 repo as an example). This is basically a git repo that stores the checkpoint specific configuration, preprocessing configuration and the model weights.
  • The code added to transformers acts as a boilerplate to initialise the model and load different checkpoints - Unified-IO trained on different datasets and/or with different resolution and/or larger / smaller architecture.
  • configuration_unifiedio.py should contain all the hyperparameters, the input image size and architectural details (e.g. number of hidden layers) to initialize the model.
  • Multi-modal models (e.g. CLIP, ALIGN) have a Processor class that capsulates Tokenizer and ImageProcessor classes that preprocesses the text and image inputs.
    • image_processing_unifiedio.py should contain the ImageProcessor class that takes in the raw input image and preprocesses it to the format expected as input to the model (resizing to a fixed input size, normalization, cropping, etc.)
    • tokenizer_unifiedio.py should contain the Tokenizer class that preprocesses the raw input text.
    • processor_unifiedio.py combines the two to preprocess image-text pair inputs.
  • modeling_unifiedio.py should contain the model definition.
  • The conversion script:
    • Loads the pretrained original model and randomly initializes the HF implementation with the corresponding configuration
    • Copies the pretrained parameters (weights and biases) of the original model to the corresponding parameters of the randomly initialized HF model (the conversion step)
    • Forward propagates an arbitrary input (text + image in this case) through both the original model and converted HF model and checks if the outputs match
    • Uploads the converted HF model to the hub
  • Each model, tokenizer, image processor and processor class is tested with scripts under tests/models/<MODEL_NAME>/ , you can refer to other test files to see what tests to add.

Once you are done, you would need to run the following commands to check the PR passes all CI tests:

make style
make quality
make repo-consistency

RUN_SLOW=TRUE pytest tests/models/unifiedio/test_modeling_unifiedio.py
RUN_SLOW=TRUE pytest tests/models/unifiedio/test_image_processor_unifiedio.py
RUN_SLOW=TRUE pytest tests/models/unifiedio/test_tokenizer_unifiedio.py
RUN_SLOW=TRUE pytest tests/models/unifiedio/test_processor_unifiedio.py

We can do an in-depth review or create a Slack channel to address questions and issues once there is a draft PR.

Hope this helps!

alaradirik avatar Mar 08 '23 12:03 alaradirik