hexia icon indicating copy to clipboard operation
hexia copied to clipboard

Mid-level PyTorch Based Framework for Visual Question Answering.

© Design by Dennis Pasyuk

forthebadge made-with-python

Read the Docs Codacy grade GitHub stars GitHub last commit GitHub issues GitHub GitHub contributors

Introduction

This is Hexia. A PyTorch based framework for building visual question answering models. Hexia provides a mid-level API for seamless integration of your VQA models with pre-defined data, image preprocessing and natural language proprocessing pipelines.

Features

  • Image preprocessing
  • Text preprocessing
  • Data Handling (MS-COCO Only)
  • Real-time Loss and Accuracy Tracker
  • VQA Evaluation
  • Extendable Built-in Model Warehouse

Installation

  1. Clone the repository and enter it:
git clone https://github.com/aligholami/hexia && cd hexia
  1. Run the setup.py to install dependencies:
python3 setup.py install --user

Todo

  • [x] Official Evaluation Support (VQA-V2)
  • [x] Automatic Train/Val Plotting
  • [x] Automatic Checkpointing
  • [x] Automatic Resuming
  • [x] Prediction Module
  • [ ] Prediction Module Test
  • [x] TensorboardX Auto-Resume Plots
  • [ ] TensorboardX Auto-Resume Step Handler Fix
  • [ ] TextVQA Support
  • [ ] GQA Support
  • [ ] Image Captioning Support
  • [ ] Custom Loss and Optimizers

Documentation

Checkout the full documentation here.

References

1- Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2016). Stacked attention networks for image question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 21-29).
2- Singh, A., Natarajan, V., Jiang, Y., Chen, X., Shah, M., Rohrbach, M., ... & Parikh, D. (2019). Pythia-a platform for vision & language research. In SysML Workshop, NeurIPS (Vol. 2018).

More references to be added soon.

Contribution

Please feel free to contribute to the project. You may send a pull-request or drop me an email to talk more. ([email protected])