transformers
transformers copied to clipboard
add Unified-IO
Model description
I'd like to request the addition of the Unified-IO model. It is a multimodal model capable of visual question answering, image generation and more... the repo is this: https://github.com/allenai/unified-io-inference the paper: Unified-IO: Sequential Modeling for Generally Applicable Vision Models
Open source status
- [X] The model implementation is available
- [X] The model weights are available
Provide useful links for the implementation
https://github.com/allenai/unified-io-inference
Hi, have you started working on the issue? Do you plan to integrate it yourself?
I'd like to work on this issue, is there any documentation on adding new models that I should follow?
I would like to work on this one.
@NielsRogge @alaradirik If no one else is currently working on adding this model, I would like to work on it.
Hi @kumar-devesh , I'm working on it (made some progress toward getting a working version of the Discrete VAE in Torch) but @osanseviero told me that it would be better to verify if there's interest from the development team. If they're ok with it then we could work on it together.
cc @sgugger @amyeroberts
Hi @ChanBong @kumar-devesh @alceballosa, Unified-IO would be a great addition to the library.
If you are not familiar with contributing to transformers, you can refer to the guidelines to get started. I'd recommend checking if you can run the original repo without any issues and get the expected results first.
Here are some summarised points that might help with model addition:
- Each model, including different checkpoints of the same model, has it's own repo on the Hub (see DETR-ResNet-50 repo as an example). This is basically a git repo that stores the checkpoint specific configuration, preprocessing configuration and the model weights.
- The code added to transformers acts as a boilerplate to initialise the model and load different checkpoints - Unified-IO trained on different datasets and/or with different resolution and/or larger / smaller architecture.
- configuration_unifiedio.py should contain all the hyperparameters, the input image size and architectural details (e.g. number of hidden layers) to initialize the model.
- Multi-modal models (e.g. CLIP, ALIGN) have a
Processor
class that capsulatesTokenizer
andImageProcessor
classes that preprocesses the text and image inputs.- image_processing_unifiedio.py should contain the ImageProcessor class that takes in the raw input image and preprocesses it to the format expected as input to the model (resizing to a fixed input size, normalization, cropping, etc.)
- tokenizer_unifiedio.py should contain the Tokenizer class that preprocesses the raw input text.
- processor_unifiedio.py combines the two to preprocess image-text pair inputs.
- modeling_unifiedio.py should contain the model definition.
- The conversion script:
- Loads the pretrained original model and randomly initializes the HF implementation with the corresponding configuration
- Copies the pretrained parameters (weights and biases) of the original model to the corresponding parameters of the randomly initialized HF model (the conversion step)
- Forward propagates an arbitrary input (text + image in this case) through both the original model and converted HF model and checks if the outputs match
- Uploads the converted HF model to the hub
- Each model, tokenizer, image processor and processor class is tested with scripts under
tests/models/<MODEL_NAME>/
, you can refer to other test files to see what tests to add.
Once you are done, you would need to run the following commands to check the PR passes all CI tests:
make style
make quality
make repo-consistency
RUN_SLOW=TRUE pytest tests/models/unifiedio/test_modeling_unifiedio.py
RUN_SLOW=TRUE pytest tests/models/unifiedio/test_image_processor_unifiedio.py
RUN_SLOW=TRUE pytest tests/models/unifiedio/test_tokenizer_unifiedio.py
RUN_SLOW=TRUE pytest tests/models/unifiedio/test_processor_unifiedio.py
We can do an in-depth review or create a Slack channel to address questions and issues once there is a draft PR.
Hope this helps!