Location-aware Graph Convolutional Networks for Video Question Answering

This repo holds the codes for the L-GCN framework presented on AAAI 2020

Location-aware Graph Convolutional Networks for Video Question Answering Deng Huang, Peihao Chen, Runhao Zeng, Qing Du, Mingkui Tan, Chuang Gan, AAAI 2020, New York.

[Paper]

Usage Guide
- Code Preparation
- Module Preparation
- Data Preparation
- Training
Other Info
- Citation
- Contact

Usage Guide

Code Preparation [back to top]

Clone this repo with git

git clone https://github.com/SunDoge/L-GCN.git
cd L-GCN

Module Preparation [back to top]

This repo is based on Pytorch>=1.2

Other modules can be installed by running

pip install -r requirements.txt
python -m spacy download en

Data Preparation [back to top]

Data Processing

Save frames

Extract frames by following the instructions in tgif-qa.

./save-frames.sh data/tgif/{gifs,frames}

Some GIF cannot be read by ffmpeg, you can use imagemagick to save the frames.

convert img.gif img/%d.jpg

Split frames

Since there are too many frames to process, we split them into N parts.

python -m scripts.split_n_parts -o data/tgif/frame_splits/

Get bboxes

Extract bboxes using Mask R-CNN. Check the script for more args.

python -m scripts.extract_bboxes_with_maskrcnn \
-f data/tgif/frame_splits/split0.pkl \
-o data/tgif/bboxes_splits/split0.pt \
-c /path/to/e2e_mask_rcnn_R_101_FPN_1x_caffe2.yaml

Merge bboxes

python -m scripts.merge_box_scores_and_labels \
--bboxes data/tgif/bboxes_splits \
-o data/tgif/bboxes

Extract bbox features

python -m scripts.extract_resnet152_features_with_bboxes \
-i data/tgif/frames \
-f data/tgif/frame_splits/split0.pkl \
-p data/tgif/bboxes_splits/split0.pt \
-o data/tgif/bbox_features_splits/split0layer4

Merge bbox features

python -m scripts.merge_bboxes \
--bboxes data/tgif/bbox_features_splits \
-o data/tgif/resnet152_bbox_features

Extract pool5 features

python -m scripts.extract_resnet152_features \
-i data/tgif/frames

Training [back to top]

Use the following command to train L-GCN

python train.py -c config/resnet152-bbox/$TASK_CONFIG -e $PATH_TO_SAVE_RESULT

$TASK_CONFIG denotes the config of task, there are four choice: action.conf, transition.conf, frameqa.conf, count.conf
$PATH_TO_SAVE_RESULT denotes the path to save the result

Other Info

Citation [back to top]

Please cite the following paper if you feel L-GCN useful to your research

@inproceedings{L-GCN2020AAAI,
  author    = {Deng Huang and
               Peihao Chen and
               Runhao Zeng and
               Qing Du and
               Mingkui Tan and
               Chuang Gan},
  title     = {Location-aware Graph Convolutional Networks for Video Question Answering},
  booktitle = {AAAI},
  year      = {2020},
}

Contact [back to top]

For any question, please file an issue or contact

[email protected]
[email protected]

L-GCN
L-GCN copied to clipboard

Metadata

Location-aware Graph Convolutional Networks for Video Question Answering

Contents

Usage Guide

Code Preparation [back to top]

Module Preparation [back to top]

Data Preparation [back to top]

Data Processing

Save frames

Split frames

Get bboxes

Merge bboxes

Extract bbox features

Merge bbox features

Extract pool5 features

Training [back to top]

Other Info

Citation [back to top]

Contact [back to top]

← Metadata

Owner

Metadata

L-GCN L-GCN copied to clipboard

Metadata

Location-aware Graph Convolutional Networks for Video Question Answering

Contents

Usage Guide

Code Preparation [back to top]

Module Preparation [back to top]

Data Preparation [back to top]

Data Processing

Save frames

Split frames

Get bboxes

Merge bboxes

Extract bbox features

Merge bbox features

Extract pool5 features

Training [back to top]

Other Info

Citation [back to top]

Contact [back to top]

← Metadata

Owner

Metadata

L-GCN
L-GCN copied to clipboard