lightning-flash icon indicating copy to clipboard operation
lightning-flash copied to clipboard

Extending the Question Answering Models to Visual Question Answering

Open uakarsh opened this issue 3 years ago • 0 comments

🚀 Feature

Extending the idea of Question Answering to Visual Question Answering

Motivation

I was going through the example and was interested in using transformers for the purpose of Visual Question Answering (could not find many resources related to the same as code), so I thought of contributing my own implementation (implemented in PyTorch), for the same. I believe that the implementation is simple enough to be quickly able to fine-tune on any dataset with ease.

Pitch

I am not sure about how to pitch, but I have managed to implement the model and get fair results on the same model. I want to extend the applicability of the model for any dataset and since this is a multi-modal model, it would be helpful for the research community as well

Alternatives

Not sure about it, since this is a model contribution.

Additional context

Here is the implementation for the same here What does this implementation contain?

  • [x] Implementation of the model in PyTorch
  • [x] Pre-training script in PyTorch Lightning
  • [x] Fine-tuning script in PyTorch Lightning (along with the checkpoint)
  • [x] Results of the experiments

uakarsh avatar Jul 06 '22 09:07 uakarsh