RVL-BERT
RVL-BERT copied to clipboard
The official code for "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" (IEEE Access, 2021)
RVL-BERT
This repository accompanies our IEEE Access paper "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" and contains validation experiments code and the models on the SpatialSense and the VRD dataset.
Installation
This project is constructed with Python 3.6, PyTorch 1.1.0 and CUDA 9.0 and largely based on VL-BERT.
Please follow the original instruction to install an conda environment.
Dataset
SpatialSense
- Download the SpatialSense dataset here.
- Put the files under
$RVL_BERT_ROOT/data/spasen
and unzip theimages.tar.gz
asimages/
there. Ensure there're two folders (flickr/
andnyu
) below$RVL_BERT_ROOT/data/spasen/images/
.
VRD
- Download the VRD dataset: images (Backup: download
sg_dataset.zip
from Baidu) and annotations - Put the
sg_train_images/
andsg_test_images/
folders under$RVL_BERT_ROOT/data/vrd/images
. - Put all
.json
files under$RVL_BERT_ROOT/data/vrd/
.
Checkpoints & Pretrained Weights
Common
Download the pretrained weights here and put the pretrained_model/
folder under $RVL_BERT_ROOT/model/
.
SpatialSense
Download the trained checkpoint here and put the .model
file under $RVL_BERT_ROOT/checkpoints/spasen/
.
VRD
Download the trained checkpoints and put the .model
files under $RVL_BERT_ROOT/checkpoints/vrd/
:
Validation
Run the following commands to reproduce experiment results. A single GPU (NVIDIA Quadro RTX 6000, 24G memory) is used by default.
SpatialSense
- Full model
python spasen/test.py --cfg cfgs/spasen/full-model.yaml --ckpt checkpoints/spasen/full-model-e44.model --bs 8 --gpus 0 --model-dir ./ --result-path results/ --result-name spasen_full_model --split test --log-dir logs
VRD
- Basic model:
python vrd/test.py --cfg cfgs/vrd/basic.yaml --ckpt checkpoints/vrd/basic-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic --split test --log-dir logs/
- Basic model + Visual-Linguistic Commonsense Knowledge
python vrd/test.py --cfg cfgs/vrd/basic_vl.yaml --ckpt checkpoints/vrd/basic-vl-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/
- Basic model + Visual-Linguistic Commonsense Knowledge + Spatial Module
python vrd/test.py --cfg cfgs/vrd/basic_vl_s.yaml --ckpt checkpoints/vrd/basic-vl-s-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/
- Full model
python vrd/test.py --cfg cfgs/vrd/basic_vl_s_m.yaml --ckpt checkpoints/vrd/basic-vl-s-m-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/
Credit
This repository is mainly based on VL-BERT.
Citation
Please cite our paper if you find the paper or our code help your research!
@ARTICLE{9387302,
author={M. -J. {Chiou} and R. {Zimmermann} and J. {Feng}},
journal={IEEE Access},
title={Visual Relationship Detection With Visual-Linguistic Knowledge From Multimodal Representations},
year={2021},
volume={9},
number={},
pages={50441-50451},
doi={10.1109/ACCESS.2021.3069041}}