Deep-Cross-Modal-Projection-Learning-for-Image-Text-Matching
Deep-Cross-Modal-Projection-Learning-for-Image-Text-Matching copied to clipboard
Deep Cross-Modal Projection Learning for Image-Text Matching
Deep Cross-Modal Projection Learning for Image-Text Matching
This is a Pytorch implmentation for the paper Deep Cross-Modal Projection Learning for Image-Text Matching.
The official implementation in TensorFlow can be found here.
Requirement
- Python 3.5
- Pytorch 1.0.0 & torchvision 0.2.1
- numpy
- scipy 1.2.1
Data Preparation
- Download the pre-computed/pre-extracted data from GoogleDrive and move them to
data/processedfolder. Or you can use the filedataset/preprocess.pyto prepare your own data. - [Optional] Download the pre-trained model weights from GoogleDrive and move them to
pretrained_modelsfolder.
Training & Testing
You should firstly change the param model_path to your current directory.
sh scripts/run.sh
You can directly run the code instead of performing training and testing seperately.
Or training:
sh scripts/train.sh
Or testing:
sh scripts/test.sh