temporal-augmentation icon indicating copy to clipboard operation
temporal-augmentation copied to clipboard

Temporal augmentation with two-stream ConvNet features on human action recognition

Temporal Augmentation using frame-level features with RNN on UCF101

License: MIT

Two-stream ConvNet has been recognized as one of the most deep ConvNet on video understanding, specifically human action recogniton. However, it suffers from the insufficient temporal datas for training.

This repository aims to implement the temporal segments RNN for training on vidoes with temporal augmentation. The implementation is based on example code from fb.resnet.torch, and was largely modified in order to work with frame level features.

Pre-saved features generated from ResNet-101 is provided.

Prerequisites

  • Linux (tested on Ubuntu 14.04)
  • Torch
  • CUDA and cuDNN
  • NVIDIA GPU is strongly recommended

Video Dataset

UCF101

The start code provided here should be relatively easy to adapt for other dataset. For example:

Features for training

I re-trained the two-stream ConvNet using pre-trained ResNet-101 on the UCF101 datasets. Please download the frame level features from the links below.

The features are coming soon.

UCF-101 split 1

You can certainly generate features for split 2 and 3 by rearranging the features according the split list provided by UCF101.

Usage

Specify the downloaded features and the types of RNN model you would like to use in opt.lua.

th main.lua

Citation

Please cite our paper, if you think the codes are useful.

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

@article{ma2017tslstm,
  title={TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition},
  author={Ma, Chih-Yao and Chen, Min-Hung and Kira, Zsolt and AlRegib, Ghassan},
  journal={arXiv preprint arXiv:1703.10667},
  year={2017}
}