hexia
hexia copied to clipboard
Mid-level PyTorch Based Framework for Visual Question Answering.
Introduction
This is Hexia. A PyTorch based framework for building visual question answering models. Hexia provides a mid-level API for seamless integration of your VQA models with pre-defined data, image preprocessing and natural language proprocessing pipelines.
Features
- Image preprocessing
- Text preprocessing
- Data Handling (MS-COCO Only)
- Real-time Loss and Accuracy Tracker
- VQA Evaluation
- Extendable Built-in Model Warehouse
Installation
- Clone the repository and enter it:
git clone https://github.com/aligholami/hexia && cd hexia
- Run the
setup.py
to install dependencies:
python3 setup.py install --user
Todo
- [x] Official Evaluation Support (VQA-V2)
- [x] Automatic Train/Val Plotting
- [x] Automatic Checkpointing
- [x] Automatic Resuming
- [x] Prediction Module
- [ ] Prediction Module Test
- [x] TensorboardX Auto-Resume Plots
- [ ] TensorboardX Auto-Resume Step Handler Fix
- [ ] TextVQA Support
- [ ] GQA Support
- [ ] Image Captioning Support
- [ ] Custom Loss and Optimizers
Documentation
Checkout the full documentation here.
References
1- Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2016). Stacked attention networks for image question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 21-29).
2- Singh, A., Natarajan, V., Jiang, Y., Chen, X., Shah, M., Rohrbach, M., ... & Parikh, D. (2019). Pythia-a platform for vision & language research. In SysML Workshop, NeurIPS (Vol. 2018).
More references to be added soon.
Contribution
Please feel free to contribute to the project. You may send a pull-request or drop me an email to talk more. ([email protected])