AR-Net: Adaptive Resolution Network for Efficient Video Understanding

! an image

Yue Meng^1,3, Chung-Ching Lin¹, Rameswar Panda¹, Prasanna Sattigeri¹, Leonid Karlinsky¹, Aude Oliva^1,3, Kate Saenko^1,2, Rogerio Feris¹

¹ MIT-IBM Watson AI Lab, IBM Research

² Boston University

³ Massachusetts Institute of Technology

In European Conference on Computer Vision (ECCV), 2020

Reference

If you find our code or project useful for your research, please cite:

@article{meng2020ar,
  title={AR-Net: Adaptive Frame Resolution for Efficient Action Recognition},
  author={Meng, Yue and Lin, Chung-Ching and Panda, Rameswar and Sattigeri, Prasanna and Karlinsky, Leonid and Oliva, Aude and Saenko, Kate and Feris, Rogerio},
  journal={arXiv preprint arXiv:2007.15796},
  year={2020}
}

Requirements

Our experiments are conducted on 4 Tesla V100 (32GB):

conda create -n arnet python=3.7.6
conda activate arnet
conda install pytorch torchvision tqdm
pip install tensorboardX thop 
git clone https://github.com/lukemelas/EfficientNet-PyTorch
cd EfficientNet-Pytorch
pip install -e .

Dataset preparation

Get the ActivityNet-v1.3 train/test splits (and classes file) from [Google Drive] and put them in /foo/bar/activity-net-v1.3. Here /foo/bar is your directory to save the datasets.
Download ActivityNet-v1.3 videos from here (contact them if there is any missing video) and save to /foo/bar/activity-net-v1.3/videos
Extract frames using the script from the repository:

cd ./ops
python video_jpg.py /foo/bar/activity-net-v1.3/videos /foo/bar/activity-net-v1.3/frames  --parallel

The frames will be saved to /foo/bar/activity-net-v1.3/frames.

Using the same procedures you can also get FCVID and mini-Kinetics. For more details please check ops/dataset_config.py

Pretrained Models

Download all our models from [Google Drive] and save them to /foo1/bar1/model_path

Evaluation

To test all the models on ActivityNet-v1.3, run:

sh full_test.sh /foo/bar/activity-net-v1.3 /foo1/bar1/model_path

The first parameter is for data path, and the second parameter is for model path. Make sure you have 4xV100(32G) to reproduce exactly the same results for the adaptive approaches (In the last two exps of the script, we also reported improved performances by using an updated training logic. See the "Training" section for more details)

Training

To train to get the AR-Net(ResNet) as shown in Table 1 (mAP-73.8), follow the training script here. It might take around 1-2 days.

We also improved our model's performance by simply changing some parameters like the learning rate. The new mAP for AR-Net(ResNet) now is 76.8 (while the updated baseline is 75.6) and the training script can be found here.

Our code is based on TSM

AR-Net
AR-Net copied to clipboard

Metadata

AR-Net: Adaptive Resolution Network for Efficient Video Understanding

Reference

Requirements

Dataset preparation

Pretrained Models

Evaluation

Training

← Metadata

Owner

Metadata

AR-Net AR-Net copied to clipboard

Metadata

AR-Net: Adaptive Resolution Network for Efficient Video Understanding

Reference

Requirements

Dataset preparation

Pretrained Models

Evaluation

Training

← Metadata

Owner

Metadata

AR-Net
AR-Net copied to clipboard