lightning-flash
lightning-flash copied to clipboard
Extending the Question Answering Models to Visual Question Answering
🚀 Feature
Extending the idea of Question Answering to Visual Question Answering
Motivation
I was going through the example and was interested in using transformers for the purpose of Visual Question Answering (could not find many resources related to the same as code), so I thought of contributing my own implementation (implemented in PyTorch), for the same. I believe that the implementation is simple enough to be quickly able to fine-tune on any dataset with ease.
Pitch
I am not sure about how to pitch, but I have managed to implement the model and get fair results on the same model. I want to extend the applicability of the model for any dataset and since this is a multi-modal model, it would be helpful for the research community as well
Alternatives
Not sure about it, since this is a model contribution.
Additional context
Here is the implementation for the same here What does this implementation contain?
- [x] Implementation of the model in PyTorch
- [x] Pre-training script in PyTorch Lightning
- [x] Fine-tuning script in PyTorch Lightning (along with the checkpoint)
- [x] Results of the experiments